何时以及为什么无监督的神经机器翻译无用？

论文标题

何时以及为什么无监督的神经机器翻译无用？

When and Why is Unsupervised Neural Machine Translation Useless?

论文作者

Kim, Yunsu, Graça, Miguel, Ney, Hermann

论文摘要

本文研究了神经机器翻译（NMT）中当前最新无监督方法的实用性。在具有各种数据设置的十个翻译任务中，我们分析了无监督方法无法产生合理翻译的条件。我们表明，它们的性能受到语言差异性和源源和目标单语言数据之间的不匹配的严重影响。这种条件对于低资源语言对很常见，无监督的学习效果不佳。在我们的所有实验中，具有50k句子双语数据的监督和半监督基线优于最佳无监督结果。我们的分析指出了当前无监督的NMT的局限性，并提出了立即的研究方向。

This paper studies the practicality of the current state-of-the-art unsupervised methods in neural machine translation (NMT). In ten translation tasks with various data settings, we analyze the conditions under which the unsupervised methods fail to produce reasonable translations. We show that their performance is severely affected by linguistic dissimilarity and domain mismatch between source and target monolingual data. Such conditions are common for low-resource language pairs, where unsupervised learning works poorly. In all of our experiments, supervised and semi-supervised baselines with 50k-sentence bilingual data outperform the best unsupervised results. Our analyses pinpoint the limits of the current unsupervised NMT and also suggest immediate research directions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题