论文标题
TSM2X:GPU上的高性能高性矩阵矩阵乘法
TSM2X: High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on GPUs
论文作者
论文摘要
线性代数操作已广泛用于大数据分析和科学计算中。使用常规输入优化GPU上的线性代数操作进行了许多工作。但是,当输入不是常规形状时,很少有工作重点是充分利用GPU资源。当前优化并不考虑充分利用内存带宽和计算能力;因此,他们只能实现次优的性能。在本文中,我们建议在GPU上的两类高层矩阵矩阵乘法中,提出了两种有效的算法-TSM2R和TSM2L。他们俩都专注于使用至少一个输入矩阵优化线性代数操作,又高又肤色。具体而言,TSM2R设计用于一个大型的常规矩阵,将高质矩阵倍增,而TSM2L则设计用于一个高质矩阵,将其乘一个小的常规矩阵倍增。我们实施了建议的算法,并对几种现代的NVIDIA GPU微构造进行了测试。实验表明,与当前的最新作品相比,(1)TSM2R将计算加速1.1x〜3x,并将记忆带宽利用率和计算功率利用率提高8%〜47.6%和7%和7%〜37.3%,当时常规形状的矩阵大小相对较大或相对较大或培养基; (2)当常规矩阵大小相对较小时,TSM2L将计算加速1.1x〜3.5倍,并将内存带宽利用率提高高达55%。
Linear algebra operations have been widely used in big data analytics and scientific computations. Many works have been done on optimizing linear algebra operations on GPUs with regular-shaped input. However, few works focus on fully utilizing GPU resources when the input is not regular-shaped. Current optimizations do not consider fully utilizing the memory bandwidth and computing power; therefore, they can only achieve sub-optimal performance. In this paper, we propose two efficient algorithms -- TSM2R and TSM2L -- for two classes of tall-and-skinny matrix-matrix multiplications on GPUs. Both of them focus on optimizing linear algebra operation with at least one of the input matrices is tall-and-skinny. Specifically, TSM2R is designed for a large regular-shaped matrix multiplying a tall-and-skinny matrix, while TSM2L is designed for a tall-and-skinny matrix multiplying a small regular-shaped matrix. We implement our proposed algorithms and test on several modern NVIDIA GPU micro-architectures. Experiments show that, compared to the current state-of-the-art works, (1) TSM2R speeds up the computation by 1.1x~3x and improves the memory bandwidth utilization and computing power utilization by 8%~47.6% and 7%~37.3%, respectively, when the regular-shaped matrix size is relatively large or medium; and (2) TSM2L speeds up the computation by 1.1x~3.5x and improve the memory bandwidth utilization by up to 55% when the regular-shaped matrix size is relatively small.