论文标题

进行科学工作流程的高级监控

Towards Advanced Monitoring for Scientific Workflows

论文作者

Bader, Jonathan, Witzke, Joel, Becker, Soeren, Lößer, Ansgar, Lehmann, Fabian, Doehler, Leon, Vu, Anh Duc, Kao, Odej

论文摘要

科学工作流程包括成千上万个在涉及许多组件的分布式环境中执行的高度平行的任务。必须自动跟踪和调查组件和任务的性能指标,痕迹和行为对于用抽象级别的最终用户支持最终用户,因为无法手动分析大量数据。科学工作流的执行和监视涉及许多组件,集群基础架构,其资源管理器,工作流程和工作流程任务。这样的执行环境中的所有组件访问不同的监视指标,并提供不同抽象级别的指标。来自不同组件及其相互依赖性的观察到的指标的组合和分析仍然广泛不受欢迎。 我们指定了四个不同的监视层,可以用作监视职责的架构蓝图以及在科学工作流执行环境中组件的相互作用。我们描述了受四层的不同监测指标以及图层如何相互作用。最后,我们检查了五个最先进的科学工作流管理系统(SWM),以评估需要哪些步骤来实现我们的四层基于我们的方法。

Scientific workflows consist of thousands of highly parallelized tasks executed in a distributed environment involving many components. Automatic tracing and investigation of the components' and tasks' performance metrics, traces, and behavior are necessary to support the end user with a level of abstraction since the large amount of data cannot be analyzed manually. The execution and monitoring of scientific workflows involves many components, the cluster infrastructure, its resource manager, the workflow, and the workflow tasks. All components in such an execution environment access different monitoring metrics and provide metrics on different abstraction levels. The combination and analysis of observed metrics from different components and their interdependencies are still widely unregarded. We specify four different monitoring layers that can serve as an architectural blueprint for the monitoring responsibilities and the interactions of components in the scientific workflow execution context. We describe the different monitoring metrics subject to the four layers and how the layers interact. Finally, we examine five state-of-the-art scientific workflow management systems (SWMS) in order to assess which steps are needed to enable our four-layer-based approach.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源