电子批量：节能和高通量RNN批次

论文标题

电子批量：节能和高通量RNN批次

E-BATCH: Energy-Efficient and High-Throughput RNN Batching

论文作者

Silfa, Franyell, Arnau, Jose Maria, Gonzalez, Antonio

论文摘要

复发性神经网络（RNN）推理由于跨时步的严格数据依赖性而表现出较低的硬件利用率。批处理多个请求可以增加吞吐量。但是，RNN批处理需要大量的填充，因为批处理输入序列的长度可能会大不相同。每隔几个时间步骤动态更新批处理的方案避免填充。但是，它们需要在短时间内执行不同的RNN层，从而降低能源效率。因此，我们提出了电子批量，这是一种针对RNN加速器量身定制的低延迟和节能批处理方案。它由运行时系统和有效的硬件支持组成。运行时会加入多个序列以创建大批量，从而节省大量能源。此外，当对序列的评估进行评估时，加速器会通知它，因此可以立即将新序列添加到批处理中，从而大大减少填充量。电子订单动态控制每批评估的时间步长的数量，以实现给定硬件平台的延迟和能源效率之间的最佳权衡。我们在E-Pur和TPU之上评估电子批量。在E-PUR中，电子订单将吞吐量提高了1.8倍，能源效率提高了3.6倍，而在TPU中，它在最先进的时间内将吞吐量提高了2.1倍，能源效率提高了1.6倍。

Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding since the batched input sequences may largely differ in length. Schemes that dynamically update the batch every few time-steps avoid padding. However, they require executing different RNN layers in a short timespan, decreasing energy efficiency. Hence, we propose E-BATCH, a low-latency and energy-efficient batching scheme tailored to RNN accelerators. It consists of a runtime system and effective hardware support. The runtime concatenates multiple sequences to create large batches, resulting in substantial energy savings. Furthermore, the accelerator notifies it when the evaluation of a sequence is done, so that a new sequence can be immediately added to a batch, thus largely reducing the amount of padding. E-BATCH dynamically controls the number of time-steps evaluated per batch to achieve the best trade-off between latency and energy efficiency for the given hardware platform. We evaluate E-BATCH on top of E-PUR and TPU. In E-PUR, E-BATCH improves throughput by 1.8x and energy-efficiency by 3.6x, whereas in TPU, it improves throughput by 2.1x and energy-efficiency by 1.6x, over the state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题