论文标题

带有融合层支持的Pre-RTL DNN硬件评估器

Pre-RTL DNN Hardware Evaluator With Fused Layer Support

论文作者

Yang, Chih-Chyau, Chang, Tian-Sheuan

论文摘要

随着深神经网络(DNN)的普及,需要硬件加速器进行实时执行。但是,冗长的设计过程和快速发展的DNN模型使硬件评估很难满足市场需求的时间。本文提出了一个Pre-RTL DNN硬件评估器,该硬件评估器支持传统的逐层处理以及用于低外部带宽要求的融合层处理。评估人员支持两个最先进的加速器体系结构,并找到最佳的硬件和层融合组实验结果表明,与逐层操作相比,层融合方案可以实现55.6%的记忆带宽减少,36.7%的潜伏期改善和49.2%的能量降低。

With the popularity of the deep neural network (DNN), hardware accelerators are demanded for real time execution. However, lengthy design process and fast evolving DNN models make hardware evaluation hard to meet the time to market need. This paper proposes a pre-RTL DNN hardware evaluator that supports conventional layer-by-layer processing as well as the fused layer processing for low external bandwidth requirement. The evaluator supports two state-of-the-art accelerator architectures and finds the best hardware and layer fusion group The experimental results show the layer fusion scheme can achieve 55.6% memory bandwidth reduction, 36.7% latency improvement and 49.2% energy reduction compared with layer-by-layer operation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源