Lite Transformer具有长短范围的关注

论文标题

Lite Transformer具有长短范围的关注

Lite Transformer with Long-Short Range Attention

论文作者

Wu, Zhanghao, Liu, Zhijian, Lin, Ji, Lin, Yujun, Han, Song

论文摘要

变压器在自然语言处理中已变得无处不在（例如，机器翻译，问答）；但是，它需要大量的计算才能实现高性能，这使其不适合受硬件资源和电池严格约束的移动应用程序。在本文中，我们提出了有效的移动NLP体系结构，即Lite Transformer，以促进在边缘设备上部署移动NLP应用程序。关键原始性是长距离范围的关注（LSRA），其中一组负责人专门从事局部环境建模（通过卷积），而另一组则专门研究长距离关系建模（通过注意）。这种专业化在三个完善的语言任务上对香草变压器进行了一致的改进：机器翻译，抽象性摘要和语言建模。在有限的资源（500m/100m Mac）下，Lite Transformer在WMT'14英语 - 弗朗奇上的表现分别超过1.2/1.7 BLEU。 Lite Transformer以0.3 BLEU评分降解将变压器基本模型的计算减少2.5倍。结合修剪和量化，我们进一步压缩了Lite变压器的模型大小18.2x。对于语言建模，Lite Transformer在500m MAC下的困惑比变压器低1.8。值得注意的是，Lite Transformer在移动NLP设置的情况下优于基于汽车的进化变压器，而没有昂贵的体系结构搜索需要超过250 GPU年。代码已在https://github.com/mit-han-lab/lite-transformer上提供。

Transformer has become ubiquitous in natural language processing (e.g., machine translation, question answering); however, it requires enormous amount of computations to achieve high performance, which makes it not suitable for mobile applications that are tightly constrained by the hardware resources and battery. In this paper, we present an efficient mobile NLP architecture, Lite Transformer to facilitate deploying mobile NLP applications on edge devices. The key primitive is the Long-Short Range Attention (LSRA), where one group of heads specializes in the local context modeling (by convolution) while another group specializes in the long-distance relationship modeling (by attention). Such specialization brings consistent improvement over the vanilla transformer on three well-established language tasks: machine translation, abstractive summarization, and language modeling. Under constrained resources (500M/100M MACs), Lite Transformer outperforms transformer on WMT'14 English-French by 1.2/1.7 BLEU, respectively. Lite Transformer reduces the computation of transformer base model by 2.5x with 0.3 BLEU score degradation. Combining with pruning and quantization, we further compressed the model size of Lite Transformer by 18.2x. For language modeling, Lite Transformer achieves 1.8 lower perplexity than the transformer at around 500M MACs. Notably, Lite Transformer outperforms the AutoML-based Evolved Transformer by 0.5 higher BLEU for the mobile NLP setting without the costly architecture search that requires more than 250 GPU years. Code has been made available at https://github.com/mit-han-lab/lite-transformer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题