论文标题
在资源受限的边缘推理中进行沟通计算取舍
Communication-Computation Trade-Off in Resource-Constrained Edge Inference
论文作者
论文摘要
最近的人工智能(AI)的突破,尤其是深神经网络(DNNS),都影响了科学和技术的每个分支。特别是,Edge AI已被视为在Edge设备提供基于DNN的服务的主要应用程序方案。本文介绍了用于资源约束设备的边缘推理的有效方法。它专注于设备边缘的共同推广,并在边缘计算服务器的协助下,并研究了设备模型的计算成本之间的关键权衡以及将中间功能转发到边缘服务器的通信成本。提出了三步框架以进行有效推断:(1)模型分式选择选择以确定设备模型,(2)通信感知模型压缩,以减少设备计算和所得的交流间接费用,以及(3)任务面向任务的编码中间功能以进一步减少通信的交流。实验表明,我们提出的框架实现了更好的权衡,并显着降低了推断潜伏期的延迟,而不是基线方法。
The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based services at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inference, assisted by an edge computing server, and investigates a critical trade-off among the computation cost of the on-device model and the communication cost of forwarding the intermediate feature to the edge server. A three-step framework is proposed for the effective inference: (1) model split point selection to determine the on-device model, (2) communication-aware model compression to reduce the on-device computation and the resulting communication overhead simultaneously, and (3) task-oriented encoding of the intermediate feature to further reduce the communication overhead. Experiments demonstrate that our proposed framework achieves a better trade-off and significantly reduces the inference latency than baseline methods.