论文标题
通过平均重量和多样化的扰动来改善合奏蒸馏
Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation
论文作者
论文摘要
深度神经网络的合奏表现出了卓越的性能,但是它们的沉重计算成本阻碍将它们应用于资源有限的环境。它激发了从集合老师的知识到较小的学生网络,并且有两个重要的设计选择,用于此合奏蒸馏:1)如何构建学生网络,以及2)在培训期间应显示哪些数据。在本文中,我们提出了一种平均水平技术,其中有多个子网的学生经过培训以吸收合奏教师的功能多样性,但是这些子网的适当平均进行推理,提供了一个学生网络,没有额外的推理成本。我们还提出了一种扰动策略,该策略寻求投入,从中可以更好地将教师的多样性转移给学生。结合了这两个,我们的方法在以前的各种图像分类任务上的方法上大大改进。
Ensembles of deep neural networks have demonstrated superior performance, but their heavy computational cost hinders applying them for resource-limited environments. It motivates distilling knowledge from the ensemble teacher into a smaller student network, and there are two important design choices for this ensemble distillation: 1) how to construct the student network, and 2) what data should be shown during training. In this paper, we propose a weight averaging technique where a student with multiple subnetworks is trained to absorb the functional diversity of ensemble teachers, but then those subnetworks are properly averaged for inference, giving a single student network with no additional inference cost. We also propose a perturbation strategy that seeks inputs from which the diversities of teachers can be better transferred to the student. Combining these two, our method significantly improves upon previous methods on various image classification tasks.