漏斗转换器：过滤顺序冗余以进行有效的语言处理

论文标题

漏斗转换器：过滤顺序冗余以进行有效的语言处理

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

论文作者

Dai, Zihang, Lai, Guokun, Yang, Yiming, Le, Quoc V.

论文摘要

随着语言预处理的成功，非常需要开发具有良好可扩展性的更有效的体系结构，可以以较低的成本利用丰富的未标记数据。为了提高效率，我们检查了保持全长的令牌级别的冗余性，尤其是对于仅需要单矢量呈现序列的任务。通过这种直觉，我们提出了漏斗转换器，该漏斗转换器逐渐将隐藏状态的顺序压缩到更短的状态，从而降低了计算成本。更重要的是，通过在构建更深入或更广泛的模型中重新投资节省的拖船从长度减小中进行投资，我们进一步提高了模型容量。此外，要按照共同的预处理目标执行令牌级别的预测，Funnel-Transformer能够通过解码器从还原的隐藏序列中恢复每个令牌的深度表示。从经验上讲，漏斗转换器在较少或更少的拖台上，在各种序列级别的预测任务上的表现优于标准变压器，包括文本分类，语言理解和阅读理解。代码和验证的检查点可在https://github.com/laiguokun/funnel-transformer上找到。

With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further improve the model capacity. In addition, to perform token-level predictions as required by common pretraining objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence via a decoder. Empirically, with comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on a wide variety of sequence-level prediction tasks, including text classification, language understanding, and reading comprehension. The code and pretrained checkpoints are available at https://github.com/laiguokun/Funnel-Transformer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题