论文标题

关于神经机器翻译中脚本之间模型转移的罗马化

On Romanization for Model Transfer Between Scripts in Neural Machine Translation

论文作者

Amrhein, Chantal, Sennrich, Rico

论文摘要

转移学习是提高低资源机器翻译质量的流行策略。为了使嵌入层的最佳转移,子和父模型应共享词汇的大部分。用不同的脚本转移到语言时,情况并非如此。在这种情况下,我们探讨了罗马化的好处。我们的结果表明,罗马化需要信息丢失,因此并不总是优于简单的词汇转移方法,而是可以通过不同脚本来改善相关语言之间的传递。我们比较了两种罗马化工具,并发现它们表现出不同程度的信息丢失,这会影响翻译质量。最后,我们将罗马化扩展到目标侧,表明当与简单的Deromanization模型相结合时,这可能是成功的策略。

Transfer learning is a popular strategy to improve the quality of low-resource machine translation. For an optimal transfer of the embedding layer, the child and parent model should share a substantial part of the vocabulary. This is not the case when transferring to languages with a different script. We explore the benefit of romanization in this scenario. Our results show that romanization entails information loss and is thus not always superior to simpler vocabulary transfer methods, but can improve the transfer between related languages with different scripts. We compare two romanization tools and find that they exhibit different degrees of information loss, which affects translation quality. Finally, we extend romanization to the target side, showing that this can be a successful strategy when coupled with a simple deromanization model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源