论文标题

饮食上的变压器

Transformer on a Diet

论文作者

Wang, Chenguang, Ye, Zihao, Zhang, Aston, Zhang, Zheng, Smola, Alexander J.

论文摘要

由于其能够以有效的方式捕获序列信息,变压器已被广泛使用。但是,最近的发展(例如BERT和GPT-2)仅提供重心的繁重的建筑,重点是有效性。在本文中,我们探索了三个精心设计的光变压器架构,以找出较少计算的变压器是否可以产生竞争结果。语言模型基准数据集的实验结果暗示,这种权衡是有希望的,而光变压器充其量最多减少70%的参数,而与标准变压器相比,竞争性的困惑性。源代码可公开可用。

Transformer has been widely used thanks to its ability to capture sequence information in an efficient way. However, recent developments, such as BERT and GPT-2, deliver only heavy architectures with a focus on effectiveness. In this paper, we explore three carefully-designed light Transformer architectures to figure out whether the Transformer with less computations could produce competitive results. Experimental results on language model benchmark datasets hint that such trade-off is promising, and the light Transformer reduces 70% parameters at best, while obtains competitive perplexity compared to standard Transformer. The source code is publicly available.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源