论文标题
无监督的机器翻译的跨模型反翻译蒸馏
Cross-model Back-translated Distillation for Unsupervised Machine Translation
论文作者
论文摘要
最近无监督的机器翻译(UMT)系统通常采用三个主要原则:初始化,语言建模和迭代反向翻译,尽管它们可能以不同的方式应用它们。至关重要的是,对语言建模的迭代反向翻译和降解自动编码为训练UMT系统提供了数据多样性。但是,这些多元化过程的收益似乎已经达到平稳状态。我们将一种新颖的组件介绍给称为跨模型反向翻译蒸馏(CBD)的标准UMT框架,该框架的目的是引起现有原则所缺乏的另一种数据多样性。 CBD适用于所有以前的UMT方法。在我们的实验中,CBD在WMT'14英语 - 法国,WMT'16英语 - 德国人和英国 - 罗马语双语无监督的翻译任务中实现了最先进的状态,分别具有38.2、30.1和36.3 Bleu。它还可以在IWSLT英语和英语 - 德国任务中产生1.5-3.3 BLEU的改进。通过广泛的实验分析,我们表明CBD具有有效性,因为它包含数据多样性,而其他类似的变体则没有。
Recent unsupervised machine translation (UMT) systems usually employ three main principles: initialization, language modeling and iterative back-translation, though they may apply them differently. Crucially, iterative back-translation and denoising auto-encoding for language modeling provide data diversity to train the UMT systems. However, the gains from these diversification processes has seemed to plateau. We introduce a novel component to the standard UMT framework called Cross-model Back-translated Distillation (CBD), that is aimed to induce another level of data diversification that existing principles lack. CBD is applicable to all previous UMT approaches. In our experiments, CBD achieves the state of the art in the WMT'14 English-French, WMT'16 English-German and English-Romanian bilingual unsupervised translation tasks, with 38.2, 30.1, and 36.3 BLEU respectively. It also yields 1.5-3.3 BLEU improvements in IWSLT English-French and English-German tasks. Through extensive experimental analyses, we show that CBD is effective because it embraces data diversity while other similar variants do not.