论文标题

在分布式系统中的快速造成销售资源利用

Fast-Fourier-Forecasting Resource Utilisation in Distributed Systems

论文作者

Pritz, Paul J., Perez, Daniel, Leung, Kin K.

论文摘要

分布式计算系统通常由数百个节点组成,执行具有不同资源要求的任务。此类系统中有效的资源配置和任务计划是非平凡的,需要对系统状态的密切监视和准确的预测,特别是在其组成机器上的资源利用率。面向这些目标的两个挑战。首先,收集监视数据需要大量通信开销。该开销可以过高,尤其是在带宽有限的网络中。其次,预测资源利用率的预测模型应该是准确的,并且需要表现出较高的推理速度。任务关键的调度和资源分配算法使用这些预测并依靠它们的即时可用性。为了应对第一个挑战,我们提出了一种沟通效率的数据收集机制。资源利用数据是在系统中的各个机器上收集的,并分批收集到中央控制器。每个批次都是通过基于傅立叶变换和频域中截断的自适应数据还原算法来处理的。我们表明,所提出的机制导致通信开销的显着降低,同时仅产生最小的误差并遵守准确性保证。为了应对第二项挑战,我们建议使用复杂的封闭式复发单元进行深度学习架构,以预测资源利用率。该体系结构与上述数据收集机制直接集成,以提高我们预测模型的推理速度。使用两个现实世界数据集,我们在预测准确性和推理速度方面都证明了方法的有效性。我们的方法解决了资源供应框架中遇到的挑战,可以应用于其他预测问题。

Distributed computing systems often consist of hundreds of nodes, executing tasks with different resource requirements. Efficient resource provisioning and task scheduling in such systems are non-trivial and require close monitoring and accurate forecasting of the state of the system, specifically resource utilisation at its constituent machines. Two challenges present themselves towards these objectives. First, collecting monitoring data entails substantial communication overhead. This overhead can be prohibitively high, especially in networks where bandwidth is limited. Second, forecasting models to predict resource utilisation should be accurate and need to exhibit high inference speed. Mission critical scheduling and resource allocation algorithms use these predictions and rely on their immediate availability. To address the first challenge, we present a communication-efficient data collection mechanism. Resource utilisation data is collected at the individual machines in the system and transmitted to a central controller in batches. Each batch is processed by an adaptive data-reduction algorithm based on Fourier transforms and truncation in the frequency domain. We show that the proposed mechanism leads to a significant reduction in communication overhead while incurring only minimal error and adhering to accuracy guarantees. To address the second challenge, we propose a deep learning architecture using complex Gated Recurrent Units to forecast resource utilisation. This architecture is directly integrated with the above data collection mechanism to improve inference speed of our forecasting model. Using two real-world datasets, we demonstrate the effectiveness of our approach, both in terms of forecasting accuracy and inference speed. Our approach resolves challenges encountered in resource provisioning frameworks and can be applied to other forecasting problems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源