论文标题

如果可以的话,请纠正我:从错误更正和标记中学习

Correct Me If You Can: Learning from Error Corrections and Markings

论文作者

Kreutzer, Julia, Berger, Nathaniel, Riezler, Stefan

论文摘要

序列到序列学习涉及信号强度和培训数据的注释成本之间的权衡。例如,机器翻译数据范围从昂贵的专家生成的翻译,这些翻译能够监督学习,到有助于促进强化学习的质量判断反馈。我们为不流行的误差标记方式提供了有关注释成本和机器可学习性的首次用户研究。我们显示,从英语到德语的TED谈话翻译的错误标记允许精确的信用分配,同时需要比纠正/后编辑的人为努力要少得多,并且可以成功使用错误标记的数据来微调神经机器翻译模型。

Sequence-to-sequence learning involves a trade-off between signal strength and annotation cost of training data. For example, machine translation data range from costly expert-generated translations that enable supervised learning, to weak quality-judgment feedback that facilitate reinforcement learning. We present the first user study on annotation cost and machine learnability for the less popular annotation mode of error markings. We show that error markings for translations of TED talks from English to German allow precise credit assignment while requiring significantly less human effort than correcting/post-editing, and that error-marked data can be used successfully to fine-tune neural machine translation models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源