论文标题
快速交织的双向序列产生
Fast Interleaved Bidirectional Sequence Generation
论文作者
论文摘要
序列生成期间的独立性假设可以加快推断,但是高度相互依赖的令牌的平行生成以质量成本为代价。我们没有假设相邻的令牌(半自动回调解码,sa)之间具有独立性,而是从双向序列生成中汲取灵感,并引入了一个解码器,该解码器同时从左到右和左侧的方向产生目标单词。我们表明,我们可以通过简单地交织两个方向并调整单词位置和自我发项式掩码来轻松将用于单向解码的标准体系结构转换为双向解码器。我们交错的双向解码器(IBDECODER)保留了标准变压器的模型简单性和训练效率,并且在五个机器翻译任务和两个文档摘要任务上,与自动策略的解码相比,可以实现〜2X的解码加速度。值得注意的是,它的表现优于从左到右的SA,因为Ibdecoder中的独立性假设更加轻率。为了达到更高的加速,我们探索了混合模型,在该模型中,我们要么同时预测每个方向的多个相邻令牌,要么通过分区目标序列进行多方向解码。这些方法以<1 bleu或<0.5 rouge的成本(平均)实现了跨不同任务的加速度至4x-11x。源代码在https://github.com/bzhanggo/zero上发布。
Independence assumptions during sequence generation can speed up inference, but parallel generation of highly inter-dependent tokens comes at a cost in quality. Instead of assuming independence between neighbouring tokens (semi-autoregressive decoding, SA), we take inspiration from bidirectional sequence generation and introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously. We show that we can easily convert a standard architecture for unidirectional decoding into a bidirectional decoder by simply interleaving the two directions and adapting the word positions and self-attention masks. Our interleaved bidirectional decoder (IBDecoder) retains the model simplicity and training efficiency of the standard Transformer, and on five machine translation tasks and two document summarization tasks, achieves a decoding speedup of ~2X compared to autoregressive decoding with comparable quality. Notably, it outperforms left-to-right SA because the independence assumptions in IBDecoder are more felicitous. To achieve even higher speedups, we explore hybrid models where we either simultaneously predict multiple neighbouring tokens per direction, or perform multi-directional decoding by partitioning the target sequence. These methods achieve speedups to 4X-11X across different tasks at the cost of <1 BLEU or <0.5 ROUGE (on average). Source code is released at https://github.com/bzhangGo/zero.