SQWA：随机量化的重量平均，以提高低精度深神经网络的概括能力

论文标题

SQWA：随机量化的重量平均，以提高低精度深神经网络的概括能力

SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of Low-Precision Deep Neural Networks

论文作者

Shin, Sungho, Boo, Yoonho, Sung, Wonyong

论文摘要

设计具有良好概括能力的深神经网络（DNN）是一个复杂的过程，尤其是当重量严格量化时。平均模型是实现DNN良好概括能力的有前途的方法，尤其是当训练损失表面包含许多尖锐的最小值时。我们提出了一种新的量化神经网络优化方法，即随机量化的权重（SQWA），以使用模型平均设计具有良好的概括能力的低精度DNN。所提出的方法包括（1）浮点模型训练，（2）重量的直接量化，（3）在使用周期性学习速率进行重新培训期间捕获多个低精度模型，（4）平均捕获的模型，以及（5）将平均模型重新固定并以低学习率进行细化。此外，我们在量化的权重域上提出了一种损失视觉化技术，以清楚地阐明了所提出的方法的行为。可视化结果表明，用建议的方法优化的量化DNN（QDNN）位于损耗表面最小值的中心附近。通过SQWA培训，我们在CIFAR-100和Imagenet数据集上获得了2位QDNN的最新结果。尽管我们仅采用统一的量化方案来实施VLSI或低精度神经处理单元，但所达到的绩效超出了使用非均匀量化的先前研究的绩效。

Designing a deep neural network (DNN) with good generalization capability is a complex process especially when the weights are severely quantized. Model averaging is a promising approach for achieving the good generalization capability of DNNs, especially when the loss surface for training contains many sharp minima. We present a new quantized neural network optimization approach, stochastic quantized weight averaging (SQWA), to design low-precision DNNs with good generalization capability using model averaging. The proposed approach includes (1) floating-point model training, (2) direct quantization of weights, (3) capturing multiple low-precision models during retraining with cyclical learning rates, (4) averaging the captured models, and (5) re-quantizing the averaged model and fine-tuning it with low-learning rates. Additionally, we present a loss-visualization technique on the quantized weight domain to clearly elucidate the behavior of the proposed method. Visualization results indicate that a quantized DNN (QDNN) optimized with the proposed approach is located near the center of the flat minimum in the loss surface. With SQWA training, we achieved state-of-the-art results for 2-bit QDNNs on CIFAR-100 and ImageNet datasets. Although we only employed a uniform quantization scheme for the sake of implementation in VLSI or low-precision neural processing units, the performance achieved exceeded those of previous studies employing non-uniform quantization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题