神经网络压缩的概述

论文标题

神经网络压缩的概述

An Overview of Neural Network Compression

论文作者

Neill, James O'

论文摘要

经过融合的过度参数化网络在计算机视觉和自然语言处理等领域表现出了令人印象深刻的性能。在这些域内的重大任务上推动最新任务的状态对应于这些模型变得更大，对于机器学习实践者来说越来越困难，鉴于内存和存储要求的增加，更不用说较大的碳足迹了。因此，近年来，模型压缩技术已经复兴，尤其是对于深层卷积神经网络和基于自我注意力的网络（例如变压器）。因此，本文及时概述了深层神经网络的旧压缩技术，包括修剪，量化，张量分解，知识蒸馏及其组合。我们对深度学习体系结构\脚注{有关深度学习的介绍，请参见〜\ citet {goodfellow2016 -deep}}}，即，经常性神经网络〜\ citep [（rnns）]网络〜\ citep {fukushima1980neocognitron}〜\ footNote {有关最新的概述，请参见〜\ citet {khan2019survey}}和基于自我注意力的网络〜\ citep {vaswani2017的vaswani2017pestition} 〜\ citet {chaudhari2019 antertive}。}，\ footNote {有关更多详细信息及其在自然语言处理中的使用，请参见〜\ citet {hu2019introductory}}}。讨论的大多数论文都是在至少其中一种DNN体系结构的背景下提出的。

Overparameterized networks trained to convergence have shown impressive performance in domains such as computer vision and natural language processing. Pushing state of the art on salient tasks within these domains corresponds to these models becoming larger and more difficult for machine learning practitioners to use given the increasing memory and storage requirements, not to mention the larger carbon footprint. Thus, in recent years there has been a resurgence in model compression techniques, particularly for deep convolutional neural networks and self-attention based networks such as the Transformer. Hence, this paper provides a timely overview of both old and current compression techniques for deep neural networks, including pruning, quantization, tensor decomposition, knowledge distillation and combinations thereof. We assume a basic familiarity with deep learning architectures\footnote{For an introduction to deep learning, see ~\citet{goodfellow2016deep}}, namely, Recurrent Neural Networks~\citep[(RNNs)][]{rumelhart1985learning,hochreiter1997long}, Convolutional Neural Networks~\citep{fukushima1980neocognitron}~\footnote{For an up to date overview see~\citet{khan2019survey}} and Self-Attention based networks~\citep{vaswani2017attention}\footnote{For a general overview of self-attention networks, see ~\citet{chaudhari2019attentive}.},\footnote{For more detail and their use in natural language processing, see~\citet{hu2019introductory}}. Most of the papers discussed are proposed in the context of at least one of these DNN architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题