论文标题
通过目标感知变压器进行知识蒸馏
Knowledge Distillation via the Target-aware Transformer
论文作者
论文摘要
知识蒸馏成为提高小神经网络性能的事实上的标准。以前的大多数作品都建议以一对一的空间匹配方式将代表性的特征从老师那里归还给学生。但是,人们倾向于忽略以下事实:由于建筑差异,同一空间位置的语义信息通常会有所不同。这极大地破坏了一对一蒸馏方法的基本假设。为此,我们提出了一种新颖的一对一空间匹配的知识蒸馏方法。具体而言,我们允许教师功能的每个像素都可以蒸馏到学生功能的所有空间位置,因为它的相似性是由目标感知变压器生成的。我们的方法通过各种计算机视觉基准(例如ImageNet,Pascal VOC和Cocostuff10k)的大幅度超过了最先进的方法。代码可在https://github.com/sihaoevery/tat上找到。
Knowledge distillation becomes a de facto standard to improve the performance of small neural networks. Most of the previous works propose to regress the representational features from the teacher to the student in a one-to-one spatial matching fashion. However, people tend to overlook the fact that, due to the architecture differences, the semantic information on the same spatial location usually vary. This greatly undermines the underlying assumption of the one-to-one distillation approach. To this end, we propose a novel one-to-all spatial matching knowledge distillation approach. Specifically, we allow each pixel of the teacher feature to be distilled to all spatial locations of the student features given its similarity, which is generated from a target-aware transformer. Our approach surpasses the state-of-the-art methods by a significant margin on various computer vision benchmarks, such as ImageNet, Pascal VOC and COCOStuff10k. Code is available at https://github.com/sihaoevery/TaT.