论文标题

基于GPU加速器的HPC测试床上的应用经验

Application Experiences on a GPU-Accelerated Arm-based HPC Testbed

论文作者

Elwasif, Wael, Godoy, William, Hagerty, Nick, Harris, J. Austin, Hernandez, Oscar, Joo, Balint, Kent, Paul, Lebrun-Grandie, Damien, Maccarthy, Elijah, Vergara, Veronica G. Melesse, Messer, Bronson, Miller, Ross, Opal, Sarp, Bastrakov, Sergei, Bussmann, Michael, Debus, Alexander, Steinger, Klaus, Stephan, Jan, Widera, Rene, Bryngelson, Spencer H., Berre, Henry Le, Radhakrishnan, Anand, Young, Jefferey, Chandrasekaran, Sunita, Ciorba, Florina, Simsek, Osman, Spiga, Kate Clark Filippo, Hammond, Jeff, Hardy, John E. Stone. David, Keller, Sebastian, Trott, Jean-Guillaume Piccinali. Christian

论文摘要

本文评估并报告了在新型GPU加速ARM测试床系统上工作的十个团队,验证和基准的十个团队的经验。测试台由八个NVIDIA ARM HPC开发人员套件系统组成,由Gigabyte构建,每个设备都配备了来自Ampere Computing的服务器级ARM CPU和NVIDIA CORP的A100 Data Center GPU。该系统都使用Infiniband高伴侣高型号的高宽边较低的低延迟互连连接在一起。所选的应用程序和迷你应用程序使用几种编程语言编写,并为GPU使用多个基于加速器的编程模型,例如CUDA,OpenACC和OpenMP Offloading。在应用程序移植上工作需要一个健壮且易于访问的编程环境,包括各种编译器和优化的科学库。这项工作的目的是评估平台的准备情况,并评估开发人员为当前和未来基于ARM的GPU加速HPC系统部署良好的科学工作负载所需的努力。报告的案例研究表明,当前的成熟度和软件和工具多样性水平已经足以用于大规模生产部署。

This paper assesses and reports the experience of ten teams working to port,validate, and benchmark several High Performance Computing applications on a novel GPU-accelerated Arm testbed system. The testbed consists of eight NVIDIA Arm HPC Developer Kit systems built by GIGABYTE, each one equipped with a server-class Arm CPU from Ampere Computing and A100 data center GPU from NVIDIA Corp. The systems are connected together using Infiniband high-bandwidth low-latency interconnect. The selected applications and mini-apps are written using several programming languages and use multiple accelerator-based programming models for GPUs such as CUDA, OpenACC, and OpenMP offloading. Working on application porting requires a robust and easy-to-access programming environment, including a variety of compilers and optimized scientific libraries. The goal of this work is to evaluate platform readiness and assess the effort required from developers to deploy well-established scientific workloads on current and future generation Arm-based GPU-accelerated HPC systems. The reported case studies demonstrate that the current level of maturity and diversity of software and tools is already adequate for large-scale production deployments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源