论文标题

ML驱动的HPC工作流中异构任务的异步执行

Asynchronous Execution of Heterogeneous Tasks in ML-driven HPC Workflows

论文作者

Pascuzzi, Vincent R., Kilic, Ozgur O., Turilli, Matteo, Jha, Shantenu

论文摘要

异类的科学工作流程包括多种需要执行异构资源的任务。这些任务的异步执行对于改善资源利用率,任务吞吐量和减少工作流程的制造物至关重要。因此,能够在异构资源上安排和执行不同任务类型的中间件必须实现异步执行任务。在本文中,我们研究了机器学习(ML)驱动的高性能计算(HPC)工作流程异步任务执行的要求和属性。我们为任意工作流程允许的异步性程度建模,并提出关键指标,这些指标可用于确定使用异步执行时的定性利益。我们的实验代表了相关的科学驱动程序,我们在峰会上进行了大规模执行它们,并且我们表明,由于异步执行而引起的性能提高与我们的模型一致。

Heterogeneous scientific workflows consist of numerous types of tasks that require executing on heterogeneous resources. Asynchronous execution of those tasks is crucial to improve resource utilization, task throughput and reduce workflows' makespan. Therefore, middleware capable of scheduling and executing different task types across heterogeneous resources must enable asynchronous execution of tasks. In this paper, we investigate the requirements and properties of the asynchronous task execution of machine learning (ML)-driven high performance computing (HPC) workflows. We model the degree of asynchronicity permitted for arbitrary workflows and propose key metrics that can be used to determine qualitative benefits when employing asynchronous execution. Our experiments represent relevant scientific drivers, we perform them at scale on Summit, and we show that the performance enhancements due to asynchronous execution are consistent with our model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源