论文标题

非自动回旋机器翻译的定向无环变压器

Directed Acyclic Transformer for Non-Autoregressive Machine Translation

论文作者

Huang, Fei, Zhou, Hao, Liu, Yang, Li, Hang, Huang, Minlie

论文摘要

非自动回旋变压器(NAT)通过并联生成所有令牌来大大减少解码潜伏期。但是,这种独立的预测阻止NAT捕获令牌之间的依赖性,以生成多个可能的翻译。在本文中,我们提出了定向的无环形转化器(DA-Transformer),该转换器表示有向的无环图(DAG)中的隐藏状态,其中DAG的每个路径都与特定的翻译相对应。整个DAG同时捕获了多次翻译,并以非解放性的方式促进了快速预测。 WMT基准测试原始训练数据的实验表明,DA-Transformer平均比以前的NAT大约优于先前的NAT,这是第一个在不依赖知识蒸馏而在自动回归的变压器的情况下实现竞争性结果的NAT模型。

Non-autoregressive Transformers (NATs) significantly reduce the decoding latency by generating all tokens in parallel. However, such independent predictions prevent NATs from capturing the dependencies between the tokens for generating multiple possible translations. In this paper, we propose Directed Acyclic Transfomer (DA-Transformer), which represents the hidden states in a Directed Acyclic Graph (DAG), where each path of the DAG corresponds to a specific translation. The whole DAG simultaneously captures multiple translations and facilitates fast predictions in a non-autoregressive fashion. Experiments on the raw training data of WMT benchmark show that DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average, which is the first NAT model that achieves competitive results with autoregressive Transformers without relying on knowledge distillation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源