学会记住：具有文档级机器翻译的带有复发记忆的变压器

论文标题

学会记住：具有文档级机器翻译的带有复发记忆的变压器

Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

论文作者

Feng, Yukun, Li, Feng, Song, Ziang, Zheng, Boyuan, Koehn, Philipp

论文摘要

变压器体系结构已导致机器翻译的显着增长。但是，大多数研究仅关注句子级翻译，而无需考虑文档中的上下文依赖性，从而导致文档级连贯性不足。最近的一些研究试图通过引入其他上下文编码器或用多个句子甚至整个文档进行翻译来减轻此问题。此类方法可能会丢失目标端的信息，或者随着文档的时间更长的时间，计算复杂性的增加。为了解决此类问题，我们向Vanilla Transformer介绍了一个经常性的内存单元，该内存器支持句子和先前上下文之间的信息交换。通过从句子中获取信息，并将汇总的知识回到后续句子状态来反复更新内存单元。我们遵循两阶段的培训策略，其中该模型首先在句子级别进行培训，然后对文档级翻译进行填充。我们在三个流行的数据集上进行了实验，以用于文档级机器翻译，而我们的模型的平均改进为0.91 S-BLEU，而句子级别的基线。我们还可以在TED和新闻上取得最先进的结果，超过了先前的0.36 S-Bleu和1.49 D-Bleu的表现。

The Transformer architecture has led to significant gains in machine translation. However, most studies focus on only sentence-level translation without considering the context dependency within documents, leading to the inadequacy of document-level coherence. Some recent research tried to mitigate this issue by introducing an additional context encoder or translating with multiple sentences or even the entire document. Such methods may lose the information on the target side or have an increasing computational complexity as documents get longer. To address such problems, we introduce a recurrent memory unit to the vanilla Transformer, which supports the information exchange between the sentence and previous context. The memory unit is recurrently updated by acquiring information from sentences, and passing the aggregated knowledge back to subsequent sentence states. We follow a two-stage training strategy, in which the model is first trained at the sentence level and then finetuned for document-level translation. We conduct experiments on three popular datasets for document-level machine translation and our model has an average improvement of 0.91 s-BLEU over the sentence-level baseline. We also achieve state-of-the-art results on TED and News, outperforming the previous work by 0.36 s-BLEU and 1.49 d-BLEU on average.

下载PDF全文

下载文献需遵守相关版权规定

论文标题