论文标题
用于计算SSD的硬件/软件可共编程框架,以加速大规模图表的深度学习服务
Hardware/Software Co-Programmable Framework for Computational SSDs to Accelerate Deep Learning Service on Large-Scale Graphs
论文作者
论文摘要
图形神经网络(GNNS)过程大规模图,包括一亿边。与传统的深度学习相反,新兴GNN的独特行为与大量图表相关,并嵌入了存储数据,这些数据表现出复杂且不规则的预处理。 我们在大图上提出了一个新颖的深度学习框架,即Holistisgnn,该框架为快速,节能的GNN处理提供了易于使用的,近存储的参与基础架构。为了达到最佳的端到端延迟和高能源效率,Holisticgnn允许用户实施各种GNN算法,并直接在以整体方式存在实际数据的情况下直接执行它们。它还可以通过PCIE启用RPC,以便用户可以简单地通过图语义库编程GNN,而无需了解基础硬件或存储配置。 我们制作了Holisticnn的硬件RTL,并在基于FPGA的计算SSD(CSSD)上实现其软件。我们的经验评估表明,Polistisgnn的推理时间超出了GNN推理服务,使用高性能现代GPU提高了7.1倍,同时平均将能源消耗降低了33.2倍。
Graph neural networks (GNNs) process large-scale graphs consisting of a hundred billion edges. In contrast to traditional deep learning, unique behaviors of the emerging GNNs are engaged with a large set of graphs and embedding data on storage, which exhibits complex and irregular preprocessing. We propose a novel deep learning framework on large graphs, HolisticGNN, that provides an easy-to-use, near-storage inference infrastructure for fast, energy-efficient GNN processing. To achieve the best end-to-end latency and high energy efficiency, HolisticGNN allows users to implement various GNN algorithms and directly executes them where the actual data exist in a holistic manner. It also enables RPC over PCIe such that the users can simply program GNNs through a graph semantic library without any knowledge of the underlying hardware or storage configurations. We fabricate HolisticGNN's hardware RTL and implement its software on an FPGA-based computational SSD (CSSD). Our empirical evaluations show that the inference time of HolisticGNN outperforms GNN inference services using high-performance modern GPUs by 7.1x while reducing energy consumption by 33.2x, on average.