实现平行工作零渐近排队延迟

论文标题

实现平行工作零渐近排队延迟

Achieving Zero Asymptotic Queueing Delay for Parallel Jobs

论文作者

Weng, Wentao, Wang, Weina

论文摘要

在大规模计算系统中，零排队延迟是非常可取的。现有工作表明，可以使用著名的$ d $ choices（pod）策略可以渐近地实现，并带有探针开头$ d =ω\ left（\ frac {\ frac {\ log n} {1- log n} {1-λ} \ right）$，当$ d = o \ weft（$ d = o \ frac} $ n normy nork y Is $ n normy node服务器和$λ$是系统的负载。但是，这些结果基于每个作业是不可分割的单元的模型，该模型不会在当今的主要平行计算范式中捕获作业的并行结构。因此，本文考虑了一个模型，其中每个作业都由一批并行任务组成。在此模型下，我们提出了一个新的零（渐近）排队延迟的概念，该概念要求在策略下的工作延迟以接近由其任务的服务时间最大给出的工作延迟，即，假设其任务在到达时立即输入服务的工作延迟。该概念量化了由多个任务组成的作业排队在工作级别上的效果，从而偏离了文献中单任务作业的常规零排队延迟。我们表明，使用批量填充政策（著名的POD政策的变体）可以通过探针开头$ d =ω\ left（\ frac {1} {（1-λ）\ log k} \ right（1} {（\ frac {1} {（1- log log k} \ right）$ halalfin-whitt-whitt-whitt-whitt $ k $ k $ k $ k $ k $ k $ k $ k $ k $ k $ k $ k $ k $ k $ k $ k y nording nork y是用$ n $（服务器数）}秤。该结果表明，对于并行作业，可以使用较小的探针开销来实现零排队延迟。我们还建立了一个不可能的结果：我们表明，如果$ d = \ exp \ left（{o \ left（\ frac {\ log n} {\ log n} {\ log k} \ right）} \ right）$）$。

Zero queueing delay is highly desirable in large-scale computing systems. Existing work has shown that it can be asymptotically achieved by using the celebrated Power-of-$d$-choices (pod) policy with a probe overhead $d = ω\left(\frac{\log N}{1-λ}\right)$, and it is impossible when $d = O\left(\frac{1}{1-λ}\right)$, where $N$ is the number of servers and $λ$ is the load of the system. However, these results are based on the model where each job is an indivisible unit, which does not capture the parallel structure of jobs in today's predominant parallel computing paradigm. This paper thus considers a model where each job consists of a batch of parallel tasks. Under this model, we propose a new notion of zero (asymptotic) queueing delay that requires the job delay under a policy to approach the job delay given by the max of its tasks' service times, i.e., the job delay assuming its tasks entered service right upon arrival. This notion quantifies the effect of queueing on a job level for jobs consisting of multiple tasks, and thus deviates from the conventional zero queueing delay for single-task jobs in the literature. We show that zero queueing delay for parallel jobs can be achieved using the batch-filling policy (a variant of the celebrated pod policy) with a probe overhead $d = ω\left(\frac{1}{(1-λ)\log k}\right)$ in the sub-Halfin-Whitt heavy-traffic regime, where $k$ is the number of tasks in each job { and $k$ properly scales with $N$ (the number of servers)}. This result demonstrates that for parallel jobs, zero queueing delay can be achieved with a smaller probe overhead. We also establish an impossibility result: we show that zero queueing delay cannot be achieved if $d = \exp\left({o\left(\frac{\log N}{\log k}\right)}\right)$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题