基于内核的加法神经网络的渐进式蒸馏

论文标题

基于内核的加法神经网络的渐进式蒸馏

Kernel Based Progressive Distillation for Adder Neural Networks

论文作者

Xu, Yixing, Xu, Chang, Chen, Xinghao, Zhang, Wei, Xu, Chunjing, Wang, Yunhe

论文摘要

Adder神经网络（ANN）仅包含添加，这为我们带来了一种开发低能量消耗的深神经网络的新方法。不幸的是，在用加法器过滤器替换所有卷积过滤器时，准确性下降。这里的主要原因是使用$ \ ell_1 $ -norm的ANN的优化难度，其中背部传播中梯度的估计是不准确的。在本文中，我们提出了一种新的方法，可以通过基于渐进的内核知识蒸馏（PKKD）方法来进一步提高ANN的性能，而无需增加可训练的参数。具有相同体系结构的卷积神经网络（CNN）同时初始化并培训为教师网络，ANN和CNN的功能和权重转换为新的空间，以消除准确性下降。相似性是在较高维度的空间中进行的，以使用基于内核的方法来解散其分布的差异。最后，根据地面真相和老师的信息逐步学习了所需的ANN。然后在几个基准上进行了良好的验证，提出的学习ANN的方法的有效性得到了良好的验证。例如，使用建议的PKKD方法训练的ANN-50在Imagenet数据集上获得了76.8 \％TOP-1的精度，该数据集比Resnet-50高0.6 \％。

Adder Neural Networks (ANNs) which only contain additions bring us a new way of developing deep neural networks with low energy consumption. Unfortunately, there is an accuracy drop when replacing all convolution filters by adder filters. The main reason here is the optimization difficulty of ANNs using $\ell_1$-norm, in which the estimation of gradient in back propagation is inaccurate. In this paper, we present a novel method for further improving the performance of ANNs without increasing the trainable parameters via a progressive kernel based knowledge distillation (PKKD) method. A convolutional neural network (CNN) with the same architecture is simultaneously initialized and trained as a teacher network, features and weights of ANN and CNN will be transformed to a new space to eliminate the accuracy drop. The similarity is conducted in a higher-dimensional space to disentangle the difference of their distributions using a kernel based method. Finally, the desired ANN is learned based on the information from both the ground-truth and teacher, progressively. The effectiveness of the proposed method for learning ANN with higher performance is then well-verified on several benchmarks. For instance, the ANN-50 trained using the proposed PKKD method obtains a 76.8\% top-1 accuracy on ImageNet dataset, which is 0.6\% higher than that of the ResNet-50.

下载PDF全文

下载文献需遵守相关版权规定

论文标题