论文标题
重新审视体系结构知识蒸馏:较小的模型和更快的搜索
Revisiting Architecture-aware Knowledge Distillation: Smaller Models and Faster Search
论文作者
论文摘要
知识蒸馏(KD)最近成为压缩神经网络的一种流行方法。在最近的研究中,已经提出了同时找到学生模型的参数和体系结构的广义蒸馏方法。尽管如此,这种搜索方法仍需要大量的计算来搜索体系结构,并且缺点是仅考虑其搜索空间中的卷积块。本文介绍了一种新的算法,认为是信任区域意识架构搜索以有效提炼知识(贸易),该算法迅速找到了使用信任区域贝叶斯优化方法从几种最先进的架构中找到有效的学生体系结构。实验结果表明,我们提出的贸易算法始终优于KD培训下的常规NAS方法和预定义的架构。
Knowledge Distillation (KD) has recently emerged as a popular method for compressing neural networks. In recent studies, generalized distillation methods that find parameters and architectures of student models at the same time have been proposed. Still, this search method requires a lot of computation to search for architectures and has the disadvantage of considering only convolutional blocks in their search space. This paper introduces a new algorithm, coined as Trust Region Aware architecture search to Distill knowledge Effectively (TRADE), that rapidly finds effective student architectures from several state-of-the-art architectures using trust region Bayesian optimization approach. Experimental results show our proposed TRADE algorithm consistently outperforms both the conventional NAS approach and pre-defined architecture under KD training.