基于FPGA的神经网络加速器用于毫米波电动机系统

论文标题

基于FPGA的神经网络加速器用于毫米波电动机系统

FPGA-based Neural Network Accelerator for Millimeter-Wave Radio-over-Fiber Systems

论文作者

Lee, Jeonghun, He, Jiayuan, Wang, Ke

论文摘要

随着迅速发展的高速无线通信，已经研究了60 GHz毫米波频率范围和无线电纤维系统，作为传递MM波信号的有前途的解决方案。已经研究了神经网络，以通过抑制线性和非线性损害来改善接收器侧的MM-WAVE ROF系统性能。但是，先前在MM-WAVE ROF系统中的神经网络研究集中于使用高端GPU的离线实施，这对于低功率消耗，低成本和有限的计算平台应用不可行。为了解决此问题，我们使用现场可编程门阵列（FPGA）研究了神经网络硬件加速器实现，利用了低功耗，并行计算能力以及FPGA的重新配置功能。证明了卷积神经网络（CNN）和二进制卷积神经网络（BCNN）硬件加速器。此外，为了满足MM-WAVE ROF系统中的低延迟需求并启用低成本紧凑的FPGA设备，提出了一种新型的内部平行优化方法。与嵌入式处理器（ARM Cortex A9）执行延迟相比，CNN/BCNN基于FPGA的硬件加速器将其延迟降低了92％以上。与非优化的FPGA实现相比，CNN和BCNN提出的优化方法将处理潜伏期降低了44％以上。与GPU实施相比，使用提出的优化方法实施的CNN实施的延迟减少了85.49％，而功耗降低了86.91％。尽管与GPU实施相比，使用提出的优化方法实施BCNN的延迟较大，但功率消耗降低了86.14％。基于FPGA的神经网络硬件加速器为MM-Wave ROF系统提供了有希望的解决方案。

With the rapidly-developing high-speed wireless communications, the 60 GHz millimeter-wave frequency range and radio-over-fiber systems have been investigated as a promising solution to deliver mm-wave signals. Neural networks have been studied to improve the mm-wave RoF system performances at the receiver side by suppressing linear and nonlinear impairments. However, previous neural network studies in mm-wave RoF systems focus on the off-line implementation with high-end GPUs , which is not practical for low power-consumption, low-cost and limited computation platform applications. To solve this issue, we investigate neural network hardware accelerator implementations using the field programmable gate array (FPGA), taking advantage of the low power consumption, parallel computation capability, and reconfigurablity features of FPGA. Convolutional neural network (CNN) and binary convolutional neural network (BCNN) hardware accelerators are demonstrated. In addition, to satisfy the low-latency requirement in mm-wave RoF systems and to enable the use of low-cost compact FPGA devices, a novel inner parallel optimization method is proposed. Compared with the embedded processor (ARM Cortex A9) execution latency, the CNN/BCNN FPGA-based hardware accelerator reduces their latency by over 92%. Compared with non-optimized FPGA implementations, the proposed optimization method reduces the processing latency by over 44% for CNN and BCNN. Compared with the GPU implementation, the latency of CNN implementation with the proposed optimization method is reduced by 85.49%, while the power consumption is reduced by 86.91%. Although the latency of BCNN implementation with the proposed optimization method is larger compared with the GPU implementation, the power consumption is reduced by 86.14%. The FPGA-based neural network hardware accelerators provide a promising solution for mm-wave RoF systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题