论文标题

无监督的机器翻译的多语言视图

A Multilingual View of Unsupervised Machine Translation

论文作者

Garcia, Xavier, Foret, Pierre, Sellam, Thibault, Parikh, Ankur P.

论文摘要

我们提出了一个用于多语言神经机器翻译的概率框架,该框架包括受监督和无监督的设置,重点是无监督的翻译。除了研究只有单语数据的香草案例外,我们还提出了一种新颖的设置,其中(源,目标)对中的一种语言与任何并行数据无关,但可能存在包含另一个语言的辅助数据。这种辅助数据自然可以通过新颖的跨翻译损失项在我们的概率框架中使用。从经验上讲,我们表明我们的方法在WMT'14英语 - 法国,WMT'16英语 - 德国人和WMT'16英国 - 罗马尼亚数据集中在大多数方向上提高了Bleu得分较高。特别是,我们在罗马尼亚英语方向上获得了+1.65 BLEU优势。

We present a probabilistic framework for multilingual neural machine translation that encompasses supervised and unsupervised setups, focusing on unsupervised translation. In addition to studying the vanilla case where there is only monolingual data available, we propose a novel setup where one language in the (source, target) pair is not associated with any parallel data, but there may exist auxiliary parallel data that contains the other. This auxiliary data can naturally be utilized in our probabilistic framework via a novel cross-translation loss term. Empirically, we show that our approach results in higher BLEU scores over state-of-the-art unsupervised models on the WMT'14 English-French, WMT'16 English-German, and WMT'16 English-Romanian datasets in most directions. In particular, we obtain a +1.65 BLEU advantage over the best-performing unsupervised model in the Romanian-English direction.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源