用于快速有限体积建模的基于小波的网格适应的GPU平行：应用于浅水流

论文标题

用于快速有限体积建模的基于小波的网格适应的GPU平行：应用于浅水流

GPU-parallelisation of wavelet-based grid adaptation for fast finite volume modelling: application to shallow water flows

论文作者

Chowdhury, Alovya Ahmed, Kesserwani, Georges, Rougé, Charles, Richmond, Paul

论文摘要

由HAAR小波（HW）的“多分辨率分析”（MRA）驱动的基于小波的网格适应性允许设计一种适应性的一阶有限体积（FV1）模型（HWFV1），该模型（HWFV1）可以很容易地保留其参考独立网格网格网格FV1对立面的建模忠诚。但是，MRA涉及“编码”（粗化），“解码”（精炼），分析和遍历嵌套均匀网格的深层层次结构的建模数据，因此会产生高计算成本。需要MRA的GPU - 平行性来降低其计算成本，但其算法结构（1）阻碍了GPU上的合并记忆访问，并且（2）涉及固有的顺序树遍历问题。这项工作重新设计了MRA的算法结构，以便在GPU上平行它，通过应用Z阶空间填充曲线并通过采用平行树遍历遍历算法来解决（1）。这导致了与GPU平行的HWFV1模型（GPU-HWFV1）。在五个浅水流量测试案例上，针对其CPU的前身（CPU-HWFV1）及其与GPU平行的参考均匀网格对应物（GPU-FV1）的GPU-HWFV1进行了验证。 GPU-HWFV1保留了GPU-FV1的建模保真度，同时高达30倍。与CPU-HWFV1相比，它的速度快200倍，这表明与GPU平行的MRA可用于加快其他FV1模型。

Wavelet-based grid adaptation driven by the "multiresolution analysis" (MRA) of the Haar wavelet (HW) allows to devise an adaptive first-order finite volume (FV1) model (HWFV1) that can readily preserve the modelling fidelity of its reference uniform-grid FV1 counterpart. However, the MRA incurs a high computational cost as it involves "encoding" (coarsening), "decoding" (refining), analysing and traversing modelled data across a deep hierarchy of nested, uniform grids. GPU-parallelisation of the MRA is needed to reduce its computational cost, but its algorithmic structure (1) hinders coalesced memory access on the GPU, and (2) involves an inherently sequential tree traversal problem. This work redesigns the algorithmic structure of the MRA in order to parallelise it on the GPU, addressing (1) by applying Z-order space-filling curves and addressing (2) by adopting a parallel tree traversal algorithm. This results in a GPU-parallelised HWFV1 model (GPU-HWFV1). GPU-HWFV1 is verified against its CPU predecessor (CPU-HWFV1) and its GPU-parallelised reference uniform-grid counterpart (GPU-FV1) over five shallow water flow test cases. GPU-HWFV1 preserves the modelling fidelity of GPU-FV1 while being up to 30 times faster. Compared to CPU-HWFV1, it is up to 200 times faster, suggesting the GPU-parallelised MRA could be used to speed up other FV1 models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题