稀疏和密度张量代数的统一迭代空间转换框架

论文标题

稀疏和密度张量代数的统一迭代空间转换框架

A Unified Iteration Space Transformation Framework for Sparse and Dense Tensor Algebra

论文作者

Senanayake, Ryan, Kjolstad, Fredrik, Hong, Changwan, Kamil, Shoaib, Amarasinghe, Saman

论文摘要

我们解决了在编译器中优化混合稀疏和密度张量代数的问题。我们表明，可以将标准回路转换（例如剥离，平铺，崩溃，并行化和矢量化）应用于稀疏迭代空间的不规则环。我们还展示了如何将这些转换应用于稀疏张量数据结构的连续值阵列，我们称之为其位置空间，以解锁负载平衡的瓷砖和并行性。我们在开源炸玉米饼系统中原型制作了这些概念，在该系统中，它们作为调度API暴露于类似于密集计算的卤化物域特定语言。使用此调度API，我们展示了如何优化混合稀疏/密度张量代数表达式，如何通过将稀疏张量代数在位置空间安排，以及如何生成稀疏张量张量代数GPU代码来生成负载平衡的代码。我们的评估表明，我们的转型使我们生成了与文献中许多手工优化实现相竞争的良好代码。

We address the problem of optimizing mixed sparse and dense tensor algebra in a compiler. We show that standard loop transformations, such as strip-mining, tiling, collapsing, parallelization and vectorization, can be applied to irregular loops over sparse iteration spaces. We also show how these transformations can be applied to the contiguous value arrays of sparse tensor data structures, which we call their position space, to unlock load-balanced tiling and parallelism. We have prototyped these concepts in the open-source TACO system, where they are exposed as a scheduling API similar to the Halide domain-specific language for dense computations. Using this scheduling API, we show how to optimize mixed sparse/dense tensor algebra expressions, how to generate load-balanced code by scheduling sparse tensor algebra in position space, and how to generate sparse tensor algebra GPU code. Our evaluation shows that our transformations let us generate good code that is competitive with many hand-optimized implementations from the literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题