使用FPGA探索Met Office NERC云模型的加速度

论文标题

使用FPGA探索Met Office NERC云模型的加速度

Exploring the acceleration of the Met Office NERC Cloud model using FPGAs

论文作者

Brown, Nick

论文摘要

使用现场可编程栅极阵列（FPGA）来加速计算内核，有可能对科学代码和HPC社区一般。随着FPGA编程技术的最新发展，端口内核的能力变得越来越容易访问。但是，要从这项技术中获得合理的性能，不足以将代码简单地传输到FPGA上，而必须以数据流样式重新考虑并重新铸造算法以适合目标体系结构。在本文中，我们通过HLS描述了Met Office NERC Cloud Model（MONC）的最密集的内核之一，该模型是气候和气候研究人员使用的大气模型，用于FPGA。我们详细描述了适应算法以使其适合体系结构及其对内核性能的影响所采取的步骤。使用PCIE安装的FPGA和板上DRAM，我们考虑在较大的基础架构中的该内核上的集成，并探索我们方法的性能特征与英特尔CPU相比在现代HPC机器中很受欢迎，而涉及非常大网格的问题大小。这项工作的结果是一份经验报告，详细介绍了将这种复杂的计算内核移植到FPGA的挑战，并探讨了FPGA可以扮演的角色及其在加速传统HPC工作量方面的基本限制。

The use of Field Programmable Gate Arrays (FPGAs) to accelerate computational kernels has the potential to be of great benefit to scientific codes and the HPC community in general. With the recent developments in FPGA programming technology, the ability to port kernels is becoming far more accessible. However, to gain reasonable performance from this technology it is not enough to simple transfer a code onto the FPGA, instead the algorithm must be rethought and recast in a data-flow style to suit the target architecture. In this paper we describe the porting, via HLS, of one of the most computationally intensive kernels of the Met Office NERC Cloud model (MONC), an atmospheric model used by climate and weather researchers, onto an FPGA. We describe in detail the steps taken to adapt the algorithm to make it suitable for the architecture and the impact this has on kernel performance. Using a PCIe mounted FPGA with on-board DRAM, we consider the integration on this kernel within a larger infrastructure and explore the performance characteristics of our approach in contrast to Intel CPUs that are popular in modern HPC machines, over problem sizes involving very large grids. The result of this work is an experience report detailing the challenges faced and lessons learnt in porting this complex computational kernel to FPGAs, as well as exploring the role that FPGAs can play and their fundamental limits in accelerating traditional HPC workloads.

下载PDF全文

下载文献需遵守相关版权规定

论文标题