多气态主义者：自动循环转换与微动力学相结合，以优化深度学习原语

论文标题

多气态主义者：自动循环转换与微动力学相结合，以优化深度学习原语

PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives

论文作者

Tavarageri, Sanket, Heinecke, Alexander, Avancha, Sasikanth, Goyal, Gagandeep, Upadrasta, Ramakrishna, Kaul, Bharat

论文摘要

深度学习训练和推理的核心是计算密集型基础，例如构成深度神经网络的基础的卷积。研究人员采用了两种不同的方法来创建深度学习内核的高性能实现，即1）图书馆开发以Intel MKL-DNN的cpus为例，2）由Tensorflow XLA编译器代表的自动汇编。这两种方法具有他们的缺点：即使自定义的建筑库可以提供良好的性能，但图书馆开发的成本和时间也很高。内核的自动汇编非常有吸引力，但实际上，直到现在，自动生成的实现滞后专家的编码内核，按数量级进行性能。在本文中，我们开发了一种混合解决方案，以开发深度学习的内核，从而达到两全其美：专家编码的微内形数用于内核的最内向循环，我们使用先进的多面技术来自动调整外部loop循环以进行性能。我们设计了一种基于多面体模型的新型数据重用算法，以优化内核的外回路。通过对重要的深度学习原始类别（即卷积）的实验评估，我们证明了我们开发的方法具有与手动编码深度学习库的Intel MKL-DNN相同的性能水平。

At the heart of deep learning training and inferencing are computationally intensive primitives such as convolutions which form the building blocks of deep neural networks. Researchers have taken two distinct approaches to creating high performance implementations of deep learning kernels, namely, 1) library development exemplified by Intel MKL-DNN for CPUs, 2) automatic compilation represented by the TensorFlow XLA compiler. The two approaches have their drawbacks: even though a custom built library can deliver very good performance, the cost and time of development of the library can be high. Automatic compilation of kernels is attractive but in practice, till date, automatically generated implementations lag expert coded kernels in performance by orders of magnitude. In this paper, we develop a hybrid solution to the development of deep learning kernels that achieves the best of both worlds: the expert coded microkernels are utilized for the innermost loops of kernels and we use the advanced polyhedral technology to automatically tune the outer loops for performance. We design a novel polyhedral model based data reuse algorithm to optimize the outer loops of the kernel. Through experimental evaluation on an important class of deep learning primitives namely convolutions, we demonstrate that the approach we develop attains the same levels of performance as Intel MKL-DNN, a hand coded deep learning library.

下载PDF全文

下载文献需遵守相关版权规定

论文标题