论文标题
可变形卷积网络的有效加速器设计方法
An Efficient Accelerator Design Methodology for Deformable Convolutional Networks
论文作者
论文摘要
可变形的卷积网络在具有有效特征提取的对象识别任务中表现出出色的性能。与标准卷积不同,可变形的卷积使用动态生成的偏移来决定接受场大小,从而导致内存不规则。特别是,内存访问模式在空间和时间上均变化,使静态优化无效。因此,幼稚的实现将导致过多的记忆足迹。在本文中,我们提出了一种新的方法来加速FPGA上的可变形卷积。首先,我们提出了一种新颖的训练方法,以减小可变形卷积层中接受场的大小,而不会损害精度。通过优化接收场,我们可以压缩接受场的最大尺寸12.6倍。其次,我们提出了一种有效的收缩期体系结构,以最大程度地提高其效率。然后,我们在FPGA上实施设计以支持优化的数据流。实验结果表明,我们的加速器在最先进的加速器上达到了17.25倍的速度。
Deformable convolutional networks have demonstrated outstanding performance in object recognition tasks with an effective feature extraction. Unlike standard convolution, the deformable convolution decides the receptive field size using dynamically generated offsets, which leads to an irregular memory access. Especially, the memory access pattern varies both spatially and temporally, making static optimization ineffective. Thus, a naive implementation would lead to an excessive memory footprint. In this paper, we present a novel approach to accelerate deformable convolution on FPGA. First, we propose a novel training method to reduce the size of the receptive field in the deformable convolutional layer without compromising accuracy. By optimizing the receptive field, we can compress the maximum size of the receptive field by 12.6 times. Second, we propose an efficient systolic architecture to maximize its efficiency. We then implement our design on FPGA to support the optimized dataflow. Experimental results show that our accelerator achieves up to 17.25 times speedup over the state-of-the-art accelerator.