论文标题
Seq2Seq生成的无损加速器具有侵略性解码
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
论文作者
论文摘要
我们使用一种新型的解码算法研究SEQ2SEQ生成的无损加速度 - 侵略性解码。与以前的努力(例如,非自动向导解码)以质量损失为代价加速SEQ2SEQ的生成,我们的方法旨在与自回旋解码相比产生相同的(或更好的)生成,但具有显着的加速,但通过具有平行计算的有效攻击性解码和验证的创新合作而实现。 We propose two Aggressive Decoding paradigms for 2 kinds of seq2seq tasks: 1) For the seq2seq tasks whose inputs and outputs are highly similar (e.g., Grammatical Error Correction), we propose Input-guided Aggressive Decoding (IAD) that aggressively copies from the input sentence as drafted decoded tokens to verify in parallel; 2)对于其他一般的SEQ2SEQ任务(例如机器翻译),我们提出了广义的积极解码(GAD),该代码(GAD)首先采用了额外的非自动回形解码模型来进行积极的解码,然后以自动回归方式并行验证。 我们在多个SEQ2SEQ任务中对GPU上最流行的6层变压器模型进行了积极的解码:1)对于IAD,我们表明它可以在语法误差校正和文本简化任务中引入7倍-9x的加速器,并以贪婪的解码为单位。 2)对于GAD,我们在两个重要的SEQ2SEQ任务中观察到具有相同质量甚至更好的质量的3x-5X加速器:机器翻译和抽象性摘要。此外,积极的解码可以从更强大的计算设备中受益更多,这些计算设备在并行计算方面更好。鉴于无损的质量以及显着且有希望的速度,我们认为积极的解码可能会发展成为事实上的标准,以在不久的将来获得高效且无损的SEQ2SEQ发电。
We study lossless acceleration for seq2seq generation with a novel decoding algorithm -- Aggressive Decoding. Unlike the previous efforts (e.g., non-autoregressive decoding) speeding up seq2seq generation at the cost of quality loss, our approach aims to yield the identical (or better) generation compared with autoregressive decoding but in a significant speedup, achieved by innovative cooperation of aggressive decoding and verification that are both efficient due to parallel computing. We propose two Aggressive Decoding paradigms for 2 kinds of seq2seq tasks: 1) For the seq2seq tasks whose inputs and outputs are highly similar (e.g., Grammatical Error Correction), we propose Input-guided Aggressive Decoding (IAD) that aggressively copies from the input sentence as drafted decoded tokens to verify in parallel; 2) For other general seq2seq tasks (e.g., Machine Translation), we propose Generalized Aggressive Decoding (GAD) that first employs an additional non-autoregressive decoding model for aggressive decoding and then verifies in parallel in the autoregressive manner. We test Aggressive Decoding on the most popular 6-layer Transformer model on GPU in multiple seq2seq tasks: 1) For IAD, we show that it can introduce a 7x-9x speedup for the Transformer in Grammatical Error Correction and Text Simplification tasks with the identical results as greedy decoding; 2) For GAD, we observe a 3x-5x speedup with the identical or even better quality in two important seq2seq tasks: Machine Translation and Abstractive Summarization. Moreover, Aggressive Decoding can benefit even more from stronger computing devices that are better at parallel computing. Given the lossless quality as well as significant and promising speedup, we believe Aggressive Decoding may potentially evolve into a de facto standard for efficient and lossless seq2seq generation in the near future.