论文标题

解剖彩票票证:稀疏神经机器翻译的结构和行为研究

Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine Translation

论文作者

Movva, Rajiv, Zhao, Jason Y.

论文摘要

在维持BLEU的同时,关于彩票票证假设的最新工作为NMT产生了高度稀疏的变压器。但是,目前尚不清楚这种修剪技术如何影响模型的所学表现。通过探测越来越低的尺寸重量的变压器,我们发现复杂的语义信息首先被降解。对内部激活的分析表明,在修剪过程中,较高的层差异最大,逐渐变得不如其密集的对应物复杂。同时,稀疏模型的早期层开始执行更多的编码。随着稀疏性的增加,注意机制仍然非常一致。

Recent work on the lottery ticket hypothesis has produced highly sparse Transformers for NMT while maintaining BLEU. However, it is unclear how such pruning techniques affect a model's learned representations. By probing Transformers with more and more low-magnitude weights pruned away, we find that complex semantic information is first to be degraded. Analysis of internal activations reveals that higher layers diverge most over the course of pruning, gradually becoming less complex than their dense counterparts. Meanwhile, early layers of sparse models begin to perform more encoding. Attention mechanisms remain remarkably consistent as sparsity increases.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源