单词对齐的神经基准

论文标题

单词对齐的神经基准

Neural Baselines for Word Alignment

论文作者

Ho, Anh Khoa Ngo, Yvon, François

论文摘要

单词对齐方式可以在平行句子对中的单词之间识别单词之间的翻译对应关系，例如学习双语词典，训练统计机器翻译系统或执行质量估计。在自然语言处理的大多数领域中，如今的神经网络模型构成了首选方法，这种情况也可能适用于单词对齐模型。在这项工作中，我们研究并全面评估了四种语言对的无监督单词一致性的神经模型，与几种神经模型的变体进行了对比。我们表明，在大多数设置中，IBM-1的神经版本和Hidden Markov模型都非常优于其离散对应物。我们还分析了我们的模型克服的基线的典型对齐错误，以说明这些新模型对形态上丰富的语言的局限性。

Word alignments identify translational correspondences between words in a parallel sentence pair and is used, for instance, to learn bilingual dictionaries, to train statistical machine translation systems , or to perform quality estimation. In most areas of natural language processing, neural network models nowadays constitute the preferred approach, a situation that might also apply to word alignment models. In this work, we study and comprehensively evaluate neural models for unsupervised word alignment for four language pairs, contrasting several variants of neural models. We show that in most settings, neural versions of the IBM-1 and hidden Markov models vastly outperform their discrete counterparts. We also analyze typical alignment errors of the baselines that our models overcome to illustrate the benefits-and the limitations-of these new models for morphologically rich languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题