Gemel：在边缘的内存效率，实时视频分析的模型合并

论文标题

Gemel：在边缘的内存效率，实时视频分析的模型合并

GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge

论文作者

Padmanabhan, Arthi, Agarwal, Neil, Iyer, Anand, Ananthanarayanan, Ganesh, Shu, Yuanchao, Karianakis, Nikolaos, Xu, Guoqing Harry, Netravali, Ravi

论文摘要

视频分析管道已稳步转移到边缘部署，以减少带宽开销和侵犯隐私权的行为，但在这样做时，面临不断增长的资源张力。最值得注意的是，Edge-box GPU缺乏同时容纳越来越多的（日益复杂）模型进行实时推断所需的内存。不幸的是，由于所需的交换延迟导致不可接受的框架下降和准确性违规，因此依赖GPU资源时间/空间共享的现有解决方案不足。我们提出了模型合并，这是一种新的内存管理技术，通过明智地共享其层（包括权重）来减少工作负载内存成本和交换延迟，从而利用边缘视觉模型之间的建筑相似性。（1）利用几种有关每模型内存使用情况和层间依赖关系的指导纪念，我们的系统有效地集成了合并到现有管道中，以快速识别富有成果和准确性的合并配置，以及（2）改变边缘推进计划以最大程度地提高合并优势。跨不同工作量的实验表明，Gemel将记忆使用量最多减少60.7％，并且相对于单独的时间/空间共享，总体准确性提高了8-39％。

Video analytics pipelines have steadily shifted to edge deployments to reduce bandwidth overheads and privacy violations, but in doing so, face an ever-growing resource tension. Most notably, edge-box GPUs lack the memory needed to concurrently house the growing number of (increasingly complex) models for real-time inference. Unfortunately, existing solutions that rely on time/space sharing of GPU resources are insufficient as the required swapping delays result in unacceptable frame drops and accuracy violations. We present model merging, a new memory management technique that exploits architectural similarities between edge vision models by judiciously sharing their layers (including weights) to reduce workload memory costs and swapping delays. Our system, GEMEL, efficiently integrates merging into existing pipelines by (1) leveraging several guiding observations about per-model memory usage and inter-layer dependencies to quickly identify fruitful and accuracy-preserving merging configurations, and (2) altering edge inference schedules to maximize merging benefits. Experiments across diverse workloads reveal that GEMEL reduces memory usage by up to 60.7%, and improves overall accuracy by 8-39% relative to time/space sharing alone.

下载PDF全文

下载文献需遵守相关版权规定

论文标题