一旦量化感知培训：高性能极低的体系结构搜索

论文标题

一旦量化感知培训：高性能极低的体系结构搜索

Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search

论文作者

Shen, Mingzhu, Liang, Feng, Gong, Ruihao, Li, Yuhang, Li, Chuming, Lin, Chen, Yu, Fengwei, Yan, Junjie, Ouyang, Wanli

论文摘要

量化神经网络（QNN）由于其高效率而引起了很多关注。为了提高量化精度，先前的工作主要集中于设计高级量化算法，但在极低的情况下仍然无法获得令人满意的结果。在这项工作中，我们采用建筑观点来研究高性能QNN的潜力。因此，我们建议将网络体系结构搜索方法与量化相结合，以享受双方的优点。但是，天真的组合不可避免地面临不可接受的时间消耗或不稳定的培训问题。为了减轻这些问题，我们首先提出建筑和量化的联合培训，并具有共同的步骤大小，以获取大量量化模型。然后引入了一个位式方案，以将量化模型转移到较低位，这进一步降低了时间成本，同时提高了量化精度。与在不同的位宽度下的各种体系结构相比，我们搜索的模型家族，OQATNETS配备了这个整体框架，被称为曾经量化的培训〜（OQAT），实现了新的最先进的框架。特别是，OQAT-2BIT-M达到61.6％的Imagenet TOP-1准确性，超过2位Mobilenetv3的较大边距为9％，计算成本降低了10％。一系列对量化友好的体系结构被轻松识别，可以进行广泛的分析以总结量化与神经体系结构之间的相互作用。代码和模型在https://github.com/lavieenrosesmz/oqa上发布

Quantization Neural Networks (QNN) have attracted a lot of attention due to their high efficiency. To enhance the quantization accuracy, prior works mainly focus on designing advanced quantization algorithms but still fail to achieve satisfactory results under the extremely low-bit case. In this work, we take an architecture perspective to investigate the potential of high-performance QNN. Therefore, we propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides. However, a naive combination inevitably faces unacceptable time consumption or unstable training problem. To alleviate these problems, we first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models. Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and meanwhile improves the quantization accuracy. Equipped with this overall framework, dubbed as Once Quantization-Aware Training~(OQAT), our searched model family, OQATNets, achieves a new state-of-the-art compared with various architectures under different bit-widths. In particular, OQAT-2bit-M achieves 61.6% ImageNet Top-1 accuracy, outperforming 2-bit counterpart MobileNetV3 by a large margin of 9% with 10% less computation cost. A series of quantization-friendly architectures are identified easily and extensive analysis can be made to summarize the interaction between quantization and neural architectures. Codes and models are released at https://github.com/LaVieEnRoseSMZ/OQA

下载PDF全文

下载文献需遵守相关版权规定

论文标题