论文标题
位体重库:在资源约束处理器上的压缩和任意精确执行神经网络
Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors
论文作者
论文摘要
近年来,神经网络在边缘系统上的应用已增殖,但是不断增加的模型大小使神经网络无法在资源受限的微控制器上有效部署。我们提出了比特系重量池,这是一个端到端框架,其中包括网络压缩和任意子字节精度的加速。与8位网络相比,通过在整个网络上共享权重,该框架可以达到8倍的压缩。我们进一步提出了一个基于比特式查找的软件实现,该软件实现可以使运行时宽度折衷,并且与8位重量池网络相比,能够达到超过2.8倍的加速和7.5倍的存储压缩,精度下降了不到1%。
Applications of neural networks on edge systems have proliferated in recent years but the ever-increasing model size makes neural networks not able to deploy on resource-constrained microcontrollers efficiently. We propose bit-serial weight pools, an end-to-end framework that includes network compression and acceleration of arbitrary sub-byte precision. The framework can achieve up to 8x compression compared to 8-bit networks by sharing a pool of weights across the entire network. We further propose a bit-serial lookup based software implementation that allows runtime-bitwidth tradeoff and is able to achieve more than 2.8x speedup and 7.5x storage compression compared to 8-bit weight pool networks, with less than 1% accuracy drop.