论文标题
将顺序信息注入条件掩盖的翻译模型中
Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism
论文作者
论文摘要
非自动回旋模型以平行的方式产生目标词,从而达到更快的解码速度,但以翻译精度的牺牲。为了补救非自动入学模型的有缺陷的翻译,一种有前途的方法是训练有条件的掩盖翻译模型(CMTM),并在几次迭代中完善生成的结果。不幸的是,这种方法几乎不考虑目标单词之间的\ textit {sequention依赖性},这不可避免地导致翻译降解。因此,我们不仅提出了一种自我审查机制来将顺序信息注入其中,而不是仅仅训练基于变压器的CMTM。具体而言,我们将从左到右的掩码插入CMTM的相同解码器,然后诱导它自动审查是否应该更换或保存CMTM生成的单词。实验结果(WMT14 EN $ \ leftrightArrow $ de和Wmt16 en $ \ leftrightArrow $ ro)表明,我们的模型比典型的CMTM所使用的训练计算少,并且超过了几种超过1个bleu的典型型号的非自动性型号。通过知识蒸馏,我们的模型甚至超过了典型的左右变压器模型,同时显着加速了解码。
Non-autoregressive models generate target words in a parallel way, which achieve a faster decoding speed but at the sacrifice of translation accuracy. To remedy a flawed translation by non-autoregressive models, a promising approach is to train a conditional masked translation model (CMTM), and refine the generated results within several iterations. Unfortunately, such approach hardly considers the \textit{sequential dependency} among target words, which inevitably results in a translation degradation. Hence, instead of solely training a Transformer-based CMTM, we propose a Self-Review Mechanism to infuse sequential information into it. Concretely, we insert a left-to-right mask to the same decoder of CMTM, and then induce it to autoregressively review whether each generated word from CMTM is supposed to be replaced or kept. The experimental results (WMT14 En$\leftrightarrow$De and WMT16 En$\leftrightarrow$Ro) demonstrate that our model uses dramatically less training computations than the typical CMTM, as well as outperforms several state-of-the-art non-autoregressive models by over 1 BLEU. Through knowledge distillation, our model even surpasses a typical left-to-right Transformer model, while significantly speeding up decoding.