Maple-X：带有显式微处理器先验知识的延迟预测

论文标题

Maple-X：带有显式微处理器先验知识的延迟预测

MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge

论文作者

Abbasi, Saad, Wong, Alexander, Shafiee, Mohammad Javad

论文摘要

深度神经网络（DNN）潜伏期表征是一个耗时的过程，在寻找有效的嵌入式视觉应用程序的有效卷积神经网络时，为神经体系结构搜索（NAS）过程增加了巨大的成本。 DNN延迟是一个依赖硬件的度量，需要直接测量或推断目标硬件。最近引入的一种被称为Maple的潜伏估计技术可通过使用硬件性能计数器在以前看不见的硬件设备上预测DNN执行时间。以隐式先验的形式利用这些硬件计数器，Maple在延迟预测中实现了最先进的性能。在这里，我们提出了Maple-X，该Maple-X通过合并了硬件设备和DNN体系结构延迟的明确知识来扩展枫木，以更好地说明模型稳定性和鲁棒性。首先，通过识别彼此相似的DNN体系结构，我们可以生成多个虚拟示例，以显着提高枫木的准确性。其次，硬件规格用于确定训练和测试硬件之间的相似性，以强调从可比设备（域）捕获的训练样本，并鼓励改进的域对齐。使用卷积神经网络基准NAS基准在不同类型的设备上，包括现在用于嵌入视力应用的Intel处理器的实验结果，表明枫树比枫树有5％的改善和9％的帮助。此外，我们包括消融研究，以独立评估虚拟示例的好处和基于硬件的样本重要性。

Deep neural network (DNN) latency characterization is a time-consuming process and adds significant cost to Neural Architecture Search (NAS) processes when searching for efficient convolutional neural networks for embedded vision applications. DNN Latency is a hardware dependent metric and requires direct measurement or inference on target hardware. A recently introduced latency estimation technique known as MAPLE predicts DNN execution time on previously unseen hardware devices by using hardware performance counters. Leveraging these hardware counters in the form of an implicit prior, MAPLE achieves state-of-the-art performance in latency prediction. Here, we propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency to better account for model stability and robustness. First, by identifying DNN architectures that exhibit a similar latency to each other, we can generate multiple virtual examples to significantly improve the accuracy over MAPLE. Secondly, the hardware specifications are used to determine the similarity between training and test hardware to emphasize training samples captured from comparable devices (domains) and encourages improved domain alignment. Experimental results using a convolution neural network NAS benchmark across different types of devices, including an Intel processor that is now used for embedded vision applications, demonstrate a 5% improvement over MAPLE and 9% over HELP. Furthermore, we include ablation studies to independently assess the benefits of virtual examples and hardware-based sample importance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题