使用扰动自动识别机器翻译中的性别问题

论文标题

使用扰动自动识别机器翻译中的性别问题

Automatically Identifying Gender Issues in Machine Translation using Perturbations

论文作者

Gonen, Hila, Webster, Kellie

论文摘要

神经方法在机器翻译中的成功应用已经实现了社区的巨大质量进步。通过这些改进，许多人注意到了重大挑战，包括对性别语言的建模和治疗。虽然先前的研究已经使用合成示例确定了问题，但我们开发了一种新型技术来挖掘现实世界数据的示例，以探讨部署系统的挑战。我们使用我们的方法来编译来自三种语言家族的四种语言的示例的评估基准，我们公开释放这些语言以促进研究。我们的基准中的示例公开了模型表示性别的位置，并且这些性别表示形式在下游应用程序中可能产生的意外后果。

The successful application of neural methods to machine translation has realized huge quality advances for the community. With these improvements, many have noted outstanding challenges, including the modeling and treatment of gendered language. While previous studies have identified issues using synthetic examples, we develop a novel technique to mine examples from real world data to explore challenges for deployed systems. We use our method to compile an evaluation benchmark spanning examples for four languages from three language families, which we publicly release to facilitate research. The examples in our benchmark expose where model representations are gendered, and the unintended consequences these gendered representations can have in downstream application.

下载PDF全文

下载文献需遵守相关版权规定

论文标题