非自动回归条件序列产生的EM方法

论文标题

非自动回归条件序列产生的EM方法

An EM Approach to Non-autoregressive Conditional Sequence Generation

论文作者

Sun, Zhiqing, Yang, Yiming

论文摘要

自回归（AR）模型一直是有条件序列产生的主导方法，但遭受了高推断潜伏期问题的困扰。最近，已提出了非自动回旋（NAR）模型，以通过并行产生所有输出令牌来减少潜伏期，但与自动回应的同行相比，仅能达到较低的精度，这主要是由于难以处理依次生成多模式的困难。本文提出了一种新方法，该方法在统一的期望最大化（EM）框架中共同优化了AR和NAR模型。在E-Step中，AR模型学会了近似NAR模型的正则后验。在M-Step中，NAR模型在新的后验上进行了更新，并为下一个AR模型选择了训练示例。这种迭代过程可以有效地指导系统以删除输出序列中的多模式。据我们所知，这是NAR序列产生的第一种EM方法。我们评估了机器翻译任务的方法。基准数据集的实验结果表明，所提出的方法可以通过现有NAR模型实现竞争性，即使不是更好的表现，并大大减少了推理潜伏期。

Autoregressive (AR) models have been the dominating approach to conditional sequence generation, but are suffering from the issue of high inference latency. Non-autoregressive (NAR) models have been recently proposed to reduce the latency by generating all output tokens in parallel but could only achieve inferior accuracy compared to their autoregressive counterparts, primarily due to a difficulty in dealing with the multi-modality in sequence generation. This paper proposes a new approach that jointly optimizes both AR and NAR models in a unified Expectation-Maximization (EM) framework. In the E-step, an AR model learns to approximate the regularized posterior of the NAR model. In the M-step, the NAR model is updated on the new posterior and selects the training examples for the next AR model. This iterative process can effectively guide the system to remove the multi-modality in the output sequences. To our knowledge, this is the first EM approach to NAR sequence generation. We evaluate our method on the task of machine translation. Experimental results on benchmark data sets show that the proposed approach achieves competitive, if not better, performance with existing NAR models and significantly reduces the inference latency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题