论文标题

用壁球和奇异性部署大型固定文件数据集

Deploying large fixed file datasets with SquashFS and Singularity

论文作者

Rioux, Pierre, Kiar, Gregory, Hutton, Alexandre, Evans, Alan C., Brown, Shawn T.

论文摘要

共享的高性能计算(HPC)平台,例如XSede和Compute Canada提供的平台,使研究人员能够以云成本的一小部分进行大规模的计算实验。大多数系统都需要使用分布式文件系统(例如光泽)来提供高度多用户的大容量存储环境。由于网络争夺和元数据性能,文件的数量增加,这些绩效会受到惩罚。我们演示了两种技术(奇异性和南瓜)的组合如何帮助开发人员,集成商,建筑师和科学家在这些共享系统上部署大型数据集(O(10m)文件),并且性能限制最少。所提出的集成可以使基于基于文件的数据集安装更有效地访问和索引,同时为用户和流程提供透明的文件访问。此外,该方法不需要目标系统上的管理特权。尽管此处研究的示例是从神经影像领域中获取的,但所采用的技术并非特定于该领域。当前,此解决方案仅限于仅阅读数据集。我们建议采用该技术,以消费和传播共享计算资源的社区数据集。

Shared high-performance computing (HPC) platforms, such as those provided by XSEDE and Compute Canada, enable researchers to carry out large-scale computational experiments at a fraction of the cost of the cloud. Most systems require the use of distributed filesystems (e.g. Lustre) for providing a highly multi-user, large capacity storage environment. These suffer performance penalties as the number of files increases due to network contention and metadata performance. We demonstrate how a combination of two technologies, Singularity and SquashFS, can help developers, integrators, architects, and scientists deploy large datasets (O(10M) files) on these shared systems with minimal performance limitations. The proposed integration enables more efficient access and indexing than normal file-based dataset installations, while providing transparent file access to users and processes. Furthermore, the approach does not require administrative privileges on the target system. While the examples studied here have been taken from the field of neuroimaging, the technologies adopted are not specific to that field. Currently, this solution is limited to read-only datasets. We propose the adoption of this technology for the consumption and dissemination of community datasets across shared computing resources.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源