论文标题

使用MPI-OIO进行交互式X射线科学的大数据分期

Big Data Staging with MPI-IO for Interactive X-ray Science

论文作者

Wozniak, Justin M., Sharma, Hemant, Armstrong, Timothy G., Wilde, Michael, Almer, Jonathan D., Foster, Ian

论文摘要

X射线散射科学实验中的新技术会产生大型数据集,这些数据集可能需要数百万个高性能处理小时的计算时间进行分析。在此类应用程序中,数据通常从X射线检测器转移到Petascale SuperComputer的所有节点共享的大型并行文件系统,然后随着不同的科学应用程序任务的重复读取。但是,这种直接的实现会在文件系统中引起重大争论。我们提出了一种替代方法,其中数据将在延长期间将数据划入计算节点内存中,在此期间,各种处理任务可以有效地访问它。我们在这里描述了基于MPI-io和快速平行脚本语言的大数据登台框架。我们讨论了X射线散射科学中涉及的一系列大规模数据管理问题,并衡量了高能衍射显微镜的新分期框架的性能好处,这是数据密集型X射线散射中的重要新兴应用。我们表明,我们的框架从三个月到不到10分钟的时间加速了科学处理的周转,而我们的I/O技术在8K蓝色基因/Q节点上将投入开销降低了5倍。

New techniques in X-ray scattering science experiments produce large data sets that can require millions of high-performance processing hours per week of computation for analysis. In such applications, data is typically moved from X-ray detectors to a large parallel file system shared by all nodes of a petascale supercomputer and then is read repeatedly as different science application tasks proceed. However, this straightforward implementation causes significant contention in the file system. We propose an alternative approach in which data is instead staged into and cached in compute node memory for extended periods, during which time various processing tasks may efficiently access it. We describe here such a big data staging framework, based on MPI-IO and the Swift parallel scripting language. We discuss a range of large-scale data management issues involved in X-ray scattering science and measure the performance benefits of the new staging framework for high-energy diffraction microscopy, an important emerging application in data-intensive X-ray scattering. We show that our framework accelerates scientific processing turnaround from three months to under 10 minutes, and that our I/O technique reduces input overheads by a factor of 5 on 8K Blue Gene/Q nodes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源