论文标题

自动编辑可以改善NMT吗?

Can Automatic Post-Editing Improve NMT?

论文作者

Chollampatt, Shamil, Susanto, Raymond Hendy, Tan, Liling, Szymanska, Ewa

论文摘要

自动编辑(APE)旨在改善机器翻译,从而减少人类的后编辑工作。与统计机器翻译(SMT)系统一起使用时,APE取得了显着的成功,但在神经机器翻译(NMT)系统上并没有成功。在当前情况下,这引发了有关猿任务的相关性的问题。但是,对猿类模型的培训一直依赖于大规模的人工语料库,仅与有限的人类后编辑数据相结合。我们假设由于缺乏足够的监督,APE模型在改善NMT翻译方面的表现不佳。为了确定我们的假设,我们将更大的英语邮政编码编译为德国NMT。我们从经验上表明,在该语料库中训练的最先进的神经猿模型可以显着改善强大的内域NMT系统,从而挑战当前现场的理解。我们进一步研究了使用人工培训数据以及对猿任务的域特异性的不同培训数据大小的影响。我们在https://github.com/shamilcm/pedra上发布了CC BY-NC-SA 4.0许可证的新语料库。

Automatic post-editing (APE) aims to improve machine translations, thereby reducing human post-editing effort. APE has had notable success when used with statistical machine translation (SMT) systems but has not been as successful over neural machine translation (NMT) systems. This has raised questions on the relevance of APE task in the current scenario. However, the training of APE models has been heavily reliant on large-scale artificial corpora combined with only limited human post-edited data. We hypothesize that APE models have been underperforming in improving NMT translations due to the lack of adequate supervision. To ascertain our hypothesis, we compile a larger corpus of human post-edits of English to German NMT. We empirically show that a state-of-art neural APE model trained on this corpus can significantly improve a strong in-domain NMT system, challenging the current understanding in the field. We further investigate the effects of varying training data sizes, using artificial training data, and domain specificity for the APE task. We release this new corpus under CC BY-NC-SA 4.0 license at https://github.com/shamilcm/pedra.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源