用于计算SSD的硬件/软件可共编程框架，以加速大规模图表的深度学习服务

论文标题

用于计算SSD的硬件/软件可共编程框架，以加速大规模图表的深度学习服务

Hardware/Software Co-Programmable Framework for Computational SSDs to Accelerate Deep Learning Service on Large-Scale Graphs

论文作者

Kwon, Miryeong, Gouk, Donghyun, Lee, Sangwon, Jung, Myoungsoo

论文摘要

图形神经网络（GNNS）过程大规模图，包括一亿边。与传统的深度学习相反，新兴GNN的独特行为与大量图表相关，并嵌入了存储数据，这些数据表现出复杂且不规则的预处理。我们在大图上提出了一个新颖的深度学习框架，即Holistisgnn，该框架为快速，节能的GNN处理提供了易于使用的，近存储的参与基础架构。为了达到最佳的端到端延迟和高能源效率，Holisticgnn允许用户实施各种GNN算法，并直接在以整体方式存在实际数据的情况下直接执行它们。它还可以通过PCIE启用RPC，以便用户可以简单地通过图语义库编程GNN，而无需了解基础硬件或存储配置。我们制作了Holisticnn的硬件RTL，并在基于FPGA的计算SSD（CSSD）上实现其软件。我们的经验评估表明，Polistisgnn的推理时间超出了GNN推理服务，使用高性能现代GPU提高了7.1倍，同时平均将能源消耗降低了33.2倍。

Graph neural networks (GNNs) process large-scale graphs consisting of a hundred billion edges. In contrast to traditional deep learning, unique behaviors of the emerging GNNs are engaged with a large set of graphs and embedding data on storage, which exhibits complex and irregular preprocessing. We propose a novel deep learning framework on large graphs, HolisticGNN, that provides an easy-to-use, near-storage inference infrastructure for fast, energy-efficient GNN processing. To achieve the best end-to-end latency and high energy efficiency, HolisticGNN allows users to implement various GNN algorithms and directly executes them where the actual data exist in a holistic manner. It also enables RPC over PCIe such that the users can simply program GNNs through a graph semantic library without any knowledge of the underlying hardware or storage configurations. We fabricate HolisticGNN's hardware RTL and implement its software on an FPGA-based computational SSD (CSSD). Our empirical evaluations show that the inference time of HolisticGNN outperforms GNN inference services using high-performance modern GPUs by 7.1x while reducing energy consumption by 33.2x, on average.

下载PDF全文

下载文献需遵守相关版权规定

论文标题