Veltair：通过自适应汇编和安排来实现高性能多租户深度学习服务

论文标题

Veltair：通过自适应汇编和安排来实现高性能多租户深度学习服务

VELTAIR: Towards High-Performance Multi-tenant Deep Learning Services via Adaptive Compilation and Scheduling

论文作者

Liu, Zihan, Leng, Jingwen, Zhang, Zhihui, Chen, Quan, Li, Chao, Guo, Minyi

论文摘要

深度学习（DL）模型在许多应用领域都取得了巨大的成功。因此，许多Google和Facebook等工业公司都承认多租户DL服务的重要性。尽管已经在传统的工作量中研究了多租户服务，但在深度学习服务中尚未深入研究，尤其是在通用硬件上。在这项工作中，我们系统地分析了从计划粒度和代码生成的各个方面提供多租户深度学习服务的机会和挑战。我们提出了一种自适应粒度调度计划，以确保资源使用效率并降低调度冲突率。我们还提出了一种自适应汇编策略，通过该策略，我们可以动态，智能地选择具有适当和共享资源使用的程序，以减少整体干扰引起的性能损失。与现有作品相比，在各种情况下，我们的设计可以在相同的QoS目标下提供更多请求（例如， +71％， +62％， +45％的光，中和重量工作量），并将平均的查询潜伏期减少50％。

Deep learning (DL) models have achieved great success in many application domains. As such, many industrial companies such as Google and Facebook have acknowledged the importance of multi-tenant DL services. Although the multi-tenant service has been studied in conventional workloads, it is not been deeply studied on deep learning service, especially on general-purpose hardware. In this work, we systematically analyze the opportunities and challenges of providing multi-tenant deep learning services on the general-purpose CPU architecture from the aspects of scheduling granularity and code generation. We propose an adaptive granularity scheduling scheme to both guarantee resource usage efficiency and reduce the scheduling conflict rate. We also propose an adaptive compilation strategy, by which we can dynamically and intelligently pick a program with proper exclusive and shared resource usage to reduce overall interference-induced performance loss. Compared to the existing works, our design can serve more requests under the same QoS target in various scenarios (e.g., +71%, +62%, +45% for light, medium, and heavy workloads, respectively), and reduce the averaged query latency by 50%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题