挤压重量：用于知识转移和模型压缩的重新聚集

论文标题

挤压重量：用于知识转移和模型压缩的重新聚集

Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression

论文作者

Chumachenko, Artem, Gavrilov, Daniil, Balagansky, Nikita, Kalaidin, Pavel

论文摘要

在这项工作中，我们提出了一种新颖的方法，用于同时知识转移和模型压缩，称为重量挤压。通过这种方法，我们通过学习从其权重到较小的学生模型权重来执行从教师模型的知识转移。我们将重量挤压到基于BERT-MEDIUM模型的预训练的文本分类模型上，并将我们的方法与胶水多任务基准上的其他各种知识转移和模型压缩方法进行了比较。我们观察到，我们的方法会产生更好的结果，同时比其他培训学生模型的方法要快得多。我们还提出了一种称为门控重量挤压的重量挤压变体，为此，我们将BERT-MEDIUM模型的微调和从Bert-Base重量进行了学习映射。我们表明，通过封闭重量挤压的微调优于Bert-Medium模型以及其他并发的SOTA方法的朴素微调，同时更容易实施。

In this work, we present a novel approach for simultaneous knowledge transfer and model compression called Weight Squeezing. With this method, we perform knowledge transfer from a teacher model by learning the mapping from its weights to smaller student model weights. We applied Weight Squeezing to a pre-trained text classification model based on BERT-Medium model and compared our method to various other knowledge transfer and model compression methods on GLUE multitask benchmark. We observed that our approach produces better results while being significantly faster than other methods for training student models. We also proposed a variant of Weight Squeezing called Gated Weight Squeezing, for which we combined fine-tuning of BERT-Medium model and learning mapping from BERT-Base weights. We showed that fine-tuning with Gated Weight Squeezing outperforms plain fine-tuning of BERT-Medium model as well as other concurrent SoTA approaches while much being easier to implement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题