论文标题
用于批处理和早期退出的多源边缘推理的资源分配(扩展版)
Resource Allocation for Multiuser Edge Inference with Batching and Early Exiting (Extended Version)
论文作者
论文摘要
网络边缘的推理服务部署称为边缘推理,从移动设备到边缘服务器的计算密集型推理任务,从而增强了前者的功能和电池寿命。在多源系统中,通信和计算的联合分配($ \ text {c}^\ text {2} $)资源(即调度和带宽分配)通过有效的推理技术,批量批次和提前退出,并通过对使用者的准确性和LATenty的杂物需求而进一步提高推理技术,使得有效的推理技术,更复杂。分组将多个任务分组为一批以进行并行处理,以减少耗时的内存访问,从而提高吞吐量(即每秒完成的任务)。另一方面,早期退出允许任务从深处网络退出,而无需穿越整个网络以支持准确性和延迟之间的权衡。在这项工作中,我们研究了最佳$ \ text {c}^\ text {2} $带有批处理和早期退出的资源分配,这是NP完整的整数编程问题。通过应对挑战,在最大吞吐量的标准下设计了一组有效的算法。实验结果表明,最佳和亚最佳$ \ text {c}^\ text {2} $资源分配算法都可以利用集成的批处理和早期退出,以使推理吞吐量与常规方案相比。
The deployment of inference services at the network edge, called edge inference, offloads computation-intensive inference tasks from mobile devices to edge servers, thereby enhancing the former's capabilities and battery lives. In a multiuser system, the joint allocation of communication-and-computation ($\text{C}^\text{2}$) resources (i.e., scheduling and bandwidth allocation) is made challenging by adopting efficient inference techniques, batching and early exiting, and further complicated by the heterogeneity in users' requirements on accuracy and latency. Batching groups multiple tasks into one batch for parallel processing to reduce time-consuming memory access and thereby boosts the throughput (i.e., completed task per second). On the other hand, early exiting allows a task to exit from a deep-neural network without traversing the whole network to support a tradeoff between accuracy and latency. In this work, we study optimal $\text{C}^\text{2}$ resource allocation with batching and early exiting, which is an NP-complete integer programming problem. A set of efficient algorithms are designed under the criterion of maximum throughput by tackling the challenge. Experimental results demonstrate that both optimal and sub-optimal $\text{C}^\text{2}$ resource allocation algorithms can leverage integrated batching and early exiting to double the inference throughput compared with conventional schemes.