神经机器翻译的语法感知数据增强

论文标题

神经机器翻译的语法感知数据增强

Syntax-aware Data Augmentation for Neural Machine Translation

论文作者

Duan, Sufeng, Zhao, Hai, Zhang, Dongdong, Wang, Rui

论文摘要

通过生成其他双语数据，数据增强是神经机译（NMT）的有效性能提高。在本文中，我们提出了一种新型的数据增强增强策略，用于神经机器翻译。与现有的数据增强方法不同，这些方法只是选择具有相同概率在不同句子上的概率进行修改的单词，我们通过考虑其在句子中的角色来设置单词选择的特定句子概率。我们使用依赖项解析输入句子的树作为确定每个句子中每个单词的概率的有效线索。我们提出的方法在WMT14英语到德语数据集和IWSLT14德语到英语数据集上进行了评估。广泛的实验的结果表明，我们提出的语法感知数据增强方法可能有效地促进了与句子无关的方法，以改善重大翻译性能。

Data augmentation is an effective performance enhancement in neural machine translation (NMT) by generating additional bilingual data. In this paper, we propose a novel data augmentation enhancement strategy for neural machine translation. Different from existing data augmentation methods which simply choose words with the same probability across different sentences for modification, we set sentence-specific probability for word selection by considering their roles in sentence. We use dependency parse tree of input sentence as an effective clue to determine selecting probability for every words in each sentence. Our proposed method is evaluated on WMT14 English-to-German dataset and IWSLT14 German-to-English dataset. The result of extensive experiments show our proposed syntax-aware data augmentation method may effectively boost existing sentence-independent methods for significant translation performance improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题