论文标题

用任务自适应正则化蒸馏对象探测器

Distilling Object Detectors with Task Adaptive Regularization

论文作者

Sun, Ruoyu, Tang, Fuhui, Zhang, Xiaopeng, Xiong, Hongkai, Tian, Qi

论文摘要

当前的最新对象探测器以高计算成本为代价,并且很难部署到低端设备。知识蒸馏旨在通过从较大的教师模型中转移知识来培训较小的学生网络,是模型小型化的有希望的解决方案之一。在本文中,我们深入研究了典型检测器的每个模块,并提出了一个一般的蒸馏框架,该框架根据特定于任务的先验将知识从教师转移到学生。直觉是,不建议简单地将所有信息从老师到学生提炼,而是我们只应该从教师模型中借用学生无法表现良好的教师。为了实现这一目标,我们提出了一个区域建议共享机制,以使教师和学生模型之间的互动区域响应。基于此,我们在三个级别上自适应地传输知识,\ emph {i.e。},功能骨干,分类头和边界框回归头,据此,模型的性能更合理。此外,考虑到同时最大程度地减少蒸馏损失和检测损失时它会引入优化难题,我们提出了一种蒸馏衰减策略,以通过逐渐减少蒸馏额来帮助改善模型概括。广泛使用检测基准的实验证明了我们方法的有效性。特别是,将更快的R-CNN与FPN作为实例化,我们在可可数据集上使用Resnet-50的精度为$ 39.0 \%$,它超过了基线$ 36.3 \%$ $ 2.7 \%\%\%$ $,甚至比教师型号$ 38.5 \%\%$ $ $。

Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices. Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization. In this paper, we investigate each module of a typical detector in depth, and propose a general distillation framework that adaptively transfers knowledge from teacher to student according to the task specific priors. The intuition is that simply distilling all information from teacher to student is not advisable, instead we should only borrow priors from the teacher model where the student cannot perform well. Towards this goal, we propose a region proposal sharing mechanism to interflow region responses between the teacher and student models. Based on this, we adaptively transfer knowledge at three levels, \emph{i.e.}, feature backbone, classification head, and bounding box regression head, according to which model performs more reasonably. Furthermore, considering that it would introduce optimization dilemma when minimizing distillation loss and detection loss simultaneously, we propose a distillation decay strategy to help improve model generalization via gradually reducing the distillation penalty. Experiments on widely used detection benchmarks demonstrate the effectiveness of our method. In particular, using Faster R-CNN with FPN as an instantiation, we achieve an accuracy of $39.0\%$ with Resnet-50 on COCO dataset, which surpasses the baseline $36.3\%$ by $2.7\%$ points, and even better than the teacher model with $38.5\%$ mAP.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源