饮食上的变压器

论文标题

饮食上的变压器

Transformer on a Diet

论文作者

Wang, Chenguang, Ye, Zihao, Zhang, Aston, Zhang, Zheng, Smola, Alexander J.

论文摘要

由于其能够以有效的方式捕获序列信息，变压器已被广泛使用。但是，最近的发展（例如BERT和GPT-2）仅提供重心的繁重的建筑，重点是有效性。在本文中，我们探索了三个精心设计的光变压器架构，以找出较少计算的变压器是否可以产生竞争结果。语言模型基准数据集的实验结果暗示，这种权衡是有希望的，而光变压器充其量最多减少70％的参数，而与标准变压器相比，竞争性的困惑性。源代码可公开可用。

Transformer has been widely used thanks to its ability to capture sequence information in an efficient way. However, recent developments, such as BERT and GPT-2, deliver only heavy architectures with a focus on effectiveness. In this paper, we explore three carefully-designed light Transformer architectures to figure out whether the Transformer with less computations could produce competitive results. Experimental results on language model benchmark datasets hint that such trade-off is promising, and the light Transformer reduces 70% parameters at best, while obtains competitive perplexity compared to standard Transformer. The source code is publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题