延迟感知可区分的神经体系结构搜索

论文标题

延迟感知可区分的神经体系结构搜索

Latency-Aware Differentiable Neural Architecture Search

论文作者

Xu, Yuhui, Xie, Lingxi, Zhang, Xiaopeng, Chen, Xin, Shi, Bowen, Tian, Qi, Xiong, Hongkai

论文摘要

近年来，可区分的神经体系结构搜索方法变得很流行，这主要是由于它们在设计搜索空间时的搜索成本低和灵活性。但是，这些方法在优化网络方面遇到了困难，因此搜索网络通常对硬件不友好。本文通过将可区分的延迟损失项添加到优化中来解决此问题，以便搜索过程可以通过平衡系数在准确性和延迟之间进行权衡。延迟预测的核心是编码每个网络体系结构并将其馈入多层回归器，并通过训练数据可以轻松地从随机采样许多架构中收集并在硬件上评估它们。我们评估了我们对NVIDIA TESLA-P100 GPU的方法。使用100k采样的架构（需要几个小时），潜伏期预测模块的相对误差低于10％。配备此模块的搜索方法可以将延迟降低20％，同时保留准确性。我们的方法还具有很少的努力，或者被用来优化其他非差异因素，例如功耗等其他因素。

Differentiable neural architecture search methods became popular in recent years, mainly due to their low search costs and flexibility in designing the search space. However, these methods suffer the difficulty in optimizing network, so that the searched network is often unfriendly to hardware. This paper deals with this problem by adding a differentiable latency loss term into optimization, so that the search process can tradeoff between accuracy and latency with a balancing coefficient. The core of latency prediction is to encode each network architecture and feed it into a multi-layer regressor, with the training data which can be easily collected from randomly sampling a number of architectures and evaluating them on the hardware. We evaluate our approach on NVIDIA Tesla-P100 GPUs. With 100K sampled architectures (requiring a few hours), the latency prediction module arrives at a relative error of lower than 10%. Equipped with this module, the search method can reduce the latency by 20% meanwhile preserving the accuracy. Our approach also enjoys the ability of being transplanted to a wide range of hardware platforms with very few efforts, or being used to optimizing other non-differentiable factors such as power consumption.

下载PDF全文

下载文献需遵守相关版权规定

论文标题