论文标题
移动设备上的高性能深度和点卷
High Performance Depthwise and Pointwise Convolutions on Mobile Devices
论文作者
论文摘要
轻巧的卷积神经网络(例如Mobilenets)是专门设计用于直接在移动设备上进行推断的。在各种轻型模型中,深度卷积(DWCONV)和Pointwise卷积(PWCONV)是其关键操作。在本文中,我们观察到,DWCONV和PWCONV的现有实施情况并不能很好地利用移动设备中的ARM处理器,并且在寄存器级别的多核和数据重复使用下显示了许多缓存遗漏。我们提出了基于ARM架构的DWCONV和PWCONV的实现的技术,以重新优化。实验结果表明,在DWCONV和PWCONV上,我们的实施分别可以实现高达5.5倍和2.1倍的加速度(Chen等人,2018)。
Lightweight convolutional neural networks (e.g., MobileNets) are specifically designed to carry out inference directly on mobile devices. Among the various lightweight models, depthwise convolution (DWConv) and pointwise convolution (PWConv) are their key operations. In this paper, we observe that the existing implementations of DWConv and PWConv are not well utilizing the ARM processors in the mobile devices, and exhibit lots of cache misses under multi-core and poor data reuse at register level. We propose techniques to re-optimize the implementations of DWConv and PWConv based on ARM architecture. Experimental results show that our implementation can respectively achieve a speedup of up to 5.5x and 2.1x against TVM (Chen et al. 2018) on DWConv and PWConv.