DNNABACUS：深入神经网络的准确计算成本预测

论文标题

DNNABACUS：深入神经网络的准确计算成本预测

DNNAbacus: Toward Accurate Computational Cost Prediction for Deep Neural Networks

论文作者

Bai, Lu, Ji, Weixing, Li, Qinyuan, Yao, Xilai, Xin, Wei, Zhu, Wanyi

论文摘要

深度学习吸引了各种领域的兴趣，包括自然语言处理，语音识别和计算机视觉。但是，模型培训是耗时的，需要大量的计算资源。深度神经网络的性能预测的现有作品主要集中在少数模型的训练时间预测上，依赖于分析模型并导致相对错误。％优化任务调度并减少数据中心的工作失败对于改善资源利用和减少碳排放至关重要。本文研究了29个经典深神经网络的计算资源需求，并建立了预测计算成本的准确模型。我们首先分析典型网络的分析结果，并证明具有不同输入和超参数的模型的计算资源需求并不明显和直观。然后，我们提出了一种轻巧的预测方法Dnnabacus，其中具有用于网络表示的新型网络结构矩阵。 DNNABACUS可以准确预测Pytorch和Tensorflow模型的内存和时间成本，该模型也可以推广到不同的硬件体系结构，并且可以具有零拍的能力来使其无法看到的网络。我们的实验结果表明，相对于时间的平均相对误差（MRE）为0.9％，相对于29种经典模型的记忆，其平均误差（MRE）为2.8％，这远低于最先进的作品。

Deep learning is attracting interest across a variety of domains, including natural language processing, speech recognition, and computer vision. However, model training is time-consuming and requires huge computational resources. Existing works on the performance prediction of deep neural networks, which mostly focus on the training time prediction of a few models, rely on analytical models and result in high relative errors. %Optimizing task scheduling and reducing job failures in data centers are essential to improve resource utilization and reduce carbon emissions. This paper investigates the computational resource demands of 29 classical deep neural networks and builds accurate models for predicting computational costs. We first analyze the profiling results of typical networks and demonstrate that the computational resource demands of models with different inputs and hyperparameters are not obvious and intuitive. We then propose a lightweight prediction approach DNNAbacus with a novel network structural matrix for network representation. DNNAbacus can accurately predict both memory and time cost for PyTorch and TensorFlow models, which is also generalized to different hardware architectures and can have zero-shot capability for unseen networks. Our experimental results show that the mean relative error (MRE) is 0.9% with respect to time and 2.8% with respect to memory for 29 classic models, which is much lower than the state-of-the-art works.

下载PDF全文

下载文献需遵守相关版权规定

论文标题