论文标题
Julia中XPU模型计算的分布式并行化
Distributed Parallelization of xPU Stencil Computations in Julia
论文作者
论文摘要
我们提出了一种直接的方法,用于在常规交错网格上分布模板XPU应用程序的分布式并行化,该应用程序在包含indimitglobalgrid.jl中实例化。该方法允许利用远程直接内存访问,并在数千个GPU上接近现实世界应用程序的理想弱尺度。通信成本可以很容易地隐藏在计算后面。
We present a straightforward approach for distributed parallelization of stencil-based xPU applications on a regular staggered grid, which is instantiated in the package ImplicitGlobalGrid.jl. The approach allows to leverage remote direct memory access and enables close to ideal weak scaling of real-world applications on thousands of GPUs. The communication costs can be easily hidden behind computation.