从稀疏线性近似中对神经网络压缩的理论理解

论文标题

从稀疏线性近似中对神经网络压缩的理论理解

A Theoretical Understanding of Neural Network Compression from Sparse Linear Approximation

论文作者

Yang, Wenjing, Wang, Ganghua, Ding, Jie, Yang, Yuhong

论文摘要

模型压缩的目的是减小大型神经网络的大小，同时保持可比的性能。结果，通过降低冗余重量，神经元或层，可能会大大降低资源有限应用中的计算和内存成本。提出了许多模型压缩算法，这些算法提供了令人印象深刻的经验成功。但是，对模型压缩的理论理解仍然受到限制。一个问题是了解网络是否比另一个相同结构更可压缩。另一个问题是量化有多少人可以通过理论上保证的准确性降解来修剪网络。在这项工作中，我们建议使用对稀疏敏感的$ \ ell_q $ -norm（$ 0 <q <1 $）来表征可压缩性，并以网络中的权重和压缩程度之间的关系与受控的准确性降级降级。我们还开发了自适应算法，用于修剪由我们理论所告知的网络中的每个神经元。数值研究表明，与标准修剪算法相比，所提出的方法的表现有希望。

The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance. As a result, computation and memory costs in resource-limited applications may be significantly reduced by dropping redundant weights, neurons, or layers. There have been many model compression algorithms proposed that provide impressive empirical success. However, a theoretical understanding of model compression is still limited. One problem is understanding if a network is more compressible than another of the same structure. Another problem is quantifying how much one can prune a network with theoretically guaranteed accuracy degradation. In this work, we propose to use the sparsity-sensitive $\ell_q$-norm ($0<q<1$) to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression with a controlled accuracy degradation bound. We also develop adaptive algorithms for pruning each neuron in the network informed by our theory. Numerical studies demonstrate the promising performance of the proposed methods compared with standard pruning algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题