深神经网络的训练后分段线性量化

论文标题

深神经网络的训练后分段线性量化

Post-Training Piecewise Linear Quantization for Deep Neural Networks

论文作者

Fang, Jun, Shafiee, Ali, Abdel-Aziz, Hamzah, Thorsley, David, Georgiadis, Georgios, Hassoun, Joseph

论文摘要

量化在资源有限设备上深层神经网络的节能部署中起重要作用。训练后量化是非常可取的，因为它不需要重新培训或访问完整的培训数据集。训练后量化的完善的统一方案通过将神经网络从完整精确转换为8位定点整数，从而实现了令人满意的结果。但是，当量化较低的位宽度时，它会遭受重大的性能降解。在本文中，我们提出了一个分段线性量化（PWLQ）方案，以实现具有长尾巴形的钟形分布的张量值的准确近似值。我们的方法将整个量化范围打破到每个张量的非重叠区域，每个区域分配了相等数量的量化水平。通过最小化量化误差，可以找到划分整个范围的最佳断点。与最先进的训练后量化方法相比，实验结果表明，我们提出的方法在图像分类，语义分割和对象检测方面具有较小的开销方面的卓越性能。

Quantization plays an important role in the energy-efficient deployment of deep neural networks on resource-limited devices. Post-training quantization is highly desirable since it does not require retraining or access to the full training dataset. The well-established uniform scheme for post-training quantization achieves satisfactory results by converting neural networks from full-precision to 8-bit fixed-point integers. However, it suffers from significant performance degradation when quantizing to lower bit-widths. In this paper, we propose a piecewise linear quantization (PWLQ) scheme to enable accurate approximation for tensor values that have bell-shaped distributions with long tails. Our approach breaks the entire quantization range into non-overlapping regions for each tensor, with each region being assigned an equal number of quantization levels. Optimal breakpoints that divide the entire range are found by minimizing the quantization error. Compared to state-of-the-art post-training quantization methods, experimental results show that our proposed method achieves superior performance on image classification, semantic segmentation, and object detection with minor overhead.

下载PDF全文

下载文献需遵守相关版权规定

论文标题