Divide（CPU负载）和征服：半富裕的云资源分配

论文标题

Divide（CPU负载）和征服：半富裕的云资源分配

Divide (CPU Load) and Conquer: Semi-Flexible Cloud Resource Allocation

论文作者

Przybylski, Bartłomiej, Żuk, Paweł, Rzadca, Krzysztof

论文摘要

云资源管理通常由二维垃圾箱包装建模，其中一组与具有固定CPU和内存要求的任务相对应的项目。但是，在云中运行的应用程序更加灵活：现代框架允许（水平）将单个应用程序扩展到数十个甚至数百个实例；然后，负载平衡器可以将工作量精确划分。我们分析了一个捕获云资源管理的（半）屈服性的模型。每个云应用程序的特征在于其内存足迹和瞬时的CPU负载。资源管理器结合了调度程序和负载平衡器，决定将创建每个应用程序的多少个实例，以及如何在它们之间平衡CPU负载。与可划分的负载模型相反，应用程序的每个实例都需要一定量的内存，而与实例数无关。因此，资源管理器有效地交易了更多的内存，以换取更平衡的负载。我们研究了两个目标：使用的机器数量的bin包装样最小化；以及所有机器中最大负载的类似于MakePan的最小化。我们证明了一般问题的np固定性，但也提出了边界特殊情况的多项式时间精确算法。值得注意的是，我们表明（半屈曲度可能会导致所需的机器数量缩短为$ 2- \ Varepsilon $。对于一般情况，我们提出了启发式方法，即通过模拟从Azure Trace得出的实例来验证。

Cloud resource management is often modeled by two-dimensional bin packing with a set of items that correspond to tasks having fixed CPU and memory requirements. However, applications running in clouds are much more flexible: modern frameworks allow to (horizontally) scale a single application to dozens, even hundreds of instances; and then the load balancer can precisely divide the workload between them. We analyze a model that captures this (semi)-flexibility of cloud resource management. Each cloud application is characterized by its memory footprint and its momentary CPU load. Combining the scheduler and the load balancer, the resource manager decides how many instances of each application will be created and how the CPU load will be balanced between them. In contrast to the divisible load model, each instance of the application requires a certain amount of memory, independent of the number of instances. Thus, the resource manager effectively trades additional memory for more evenly balanced load. We study two objectives: the bin-packing-like minimization of the number of machines used; and the makespan-like minimization of the maximum load among all the machines. We prove NP-hardness of the general problems, but also propose polynomial-time exact algorithms for boundary special cases. Notably, we show that (semi)-flexibility may result in reducing the required number of machines by a tight factor of $2-\varepsilon$. For the general case, we propose heuristics that we validate by simulation on instances derived from the Azure trace.

下载PDF全文

下载文献需遵守相关版权规定

论文标题