RTMobile：超越RNN的实时移动加速度以供语音识别

论文标题

RTMobile：超越RNN的实时移动加速度以供语音识别

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition

论文作者

Dong, Peiyan, Wang, Siyue, Niu, Wei, Zhang, Chengming, Lin, Sheng, Li, Zhengang, Gong, Yifan, Ren, Bin, Lin, Xue, Wang, Yanzhi, Tao, Dingwen

论文摘要

如今，基于复发性的神经网络（RNN）自动语音识别已在诸如智能手机之类的移动设备上普遍存在。但是，以前的RNN压缩技术要么由于不规则性而遭受硬件性能间接费用，要么由于保留的硬件友好性规律性而导致的准确性损失很大。在这项工作中，我们提出了RTMobile，它利用了一种新型的基于块的修剪方法和编译器优化来加速移动设备上的RNN推断。我们提出的RTMobile是可以实现移动平台上实时RNN推断的第一项工作。实验结果表明，就推理准确性和时间而言，RTMobile可以显着超过现有的RNN硬件加速方法。与先前在FPGA上的工作相比，在GRU上使用Adreno 640嵌入式GPU的RTMobile可以将能源效率提高约40美元$ \ times $，同时保持相同的推理时间。

Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become prevalent on mobile devices such as smart phones. However, previous RNN compression techniques either suffer from hardware performance overhead due to irregularity or significant accuracy loss due to the preserved regularity for hardware friendliness. In this work, we propose RTMobile that leverages both a novel block-based pruning approach and compiler optimizations to accelerate RNN inference on mobile devices. Our proposed RTMobile is the first work that can achieve real-time RNN inference on mobile platforms. Experimental results demonstrate that RTMobile can significantly outperform existing RNN hardware acceleration methods in terms of inference accuracy and time. Compared with prior work on FPGA, RTMobile using Adreno 640 embedded GPU on GRU can improve the energy-efficiency by about 40$\times$ while maintaining the same inference time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题