论文标题

SQWA:随机量化的重量平均,以提高低精度深神经网络的概括能力

SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of Low-Precision Deep Neural Networks

论文作者

Shin, Sungho, Boo, Yoonho, Sung, Wonyong

论文摘要

设计具有良好概括能力的深神经网络(DNN)是一个复杂的过程,尤其是当重量严格量化时。平均模型是实现DNN良好概括能力的有前途的方法,尤其是当训练损失表面包含许多尖锐的最小值时。我们提出了一种新的量化神经网络优化方法,即随机量化的权重(SQWA),以使用模型平均设计具有良好的概括能力的低精度DNN。所提出的方法包括(1)浮点模型训练,(2)重量的直接量化,(3)在使用周期性学习速率进行重新培训期间捕获多个低精度模型,(4)平均捕获的模型,以及(5)将平均模型重新固定并以低学习率进行细化。此外,我们在量化的权重域上提出了一种损失视觉化技术,以清楚地阐明了所提出的方法的行为。可视化结果表明,用建议的方法优化的量化DNN(QDNN)位于损耗表面最小值的中心附近。通过SQWA培训,我们在CIFAR-100和Imagenet数据集上获得了2位QDNN的最新结果。尽管我们仅采用统一的量化方案来实施VLSI或低精度神经处理单元,但所达到的绩效超出了使用非均匀量化的先前研究的绩效。

Designing a deep neural network (DNN) with good generalization capability is a complex process especially when the weights are severely quantized. Model averaging is a promising approach for achieving the good generalization capability of DNNs, especially when the loss surface for training contains many sharp minima. We present a new quantized neural network optimization approach, stochastic quantized weight averaging (SQWA), to design low-precision DNNs with good generalization capability using model averaging. The proposed approach includes (1) floating-point model training, (2) direct quantization of weights, (3) capturing multiple low-precision models during retraining with cyclical learning rates, (4) averaging the captured models, and (5) re-quantizing the averaged model and fine-tuning it with low-learning rates. Additionally, we present a loss-visualization technique on the quantized weight domain to clearly elucidate the behavior of the proposed method. Visualization results indicate that a quantized DNN (QDNN) optimized with the proposed approach is located near the center of the flat minimum in the loss surface. With SQWA training, we achieved state-of-the-art results for 2-bit QDNNs on CIFAR-100 and ImageNet datasets. Although we only employed a uniform quantization scheme for the sake of implementation in VLSI or low-precision neural processing units, the performance achieved exceeded those of previous studies employing non-uniform quantization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源