几次射击神经建筑搜索

论文标题

几次射击神经建筑搜索

Few-shot Neural Architecture Search

论文作者

Zhao, Yiyang, Wang, Linnan, Tian, Yuandong, Fonseca, Rodrigo, Guo, Tian

论文摘要

从大型搜索空间中绘制的网络体系结构的有效评估仍然是神经体系结构搜索（NAS）的关键挑战。 Vanilla Nas通过从头开始培训来评估每个体系结构，这给出了真实的表现，但非常耗时。最近，单发NAS仅通过训练一个超级网络（又称Supernet）来大大降低计算成本，以通过重量共享近似搜索空间中每个体系结构的性能。但是，由于操作之间的共同适应，性能估计可能非常不准确。在本文中，我们提出了几种使用多个Supernetworks的NAS，称为Sub-Supernet，每个NA都涵盖了搜索空间的不同区域，以减轻不希望的共同适应。与单发NAS相比，很少有NAS通过少量评估成本提高了体系结构评估的准确性。只有多达7个子植入网络，很少有NAS建立新的SOTA：在Imagenet上，它找到了在600 MB FLOPS时达到80.5％Top-1精度的模型，在238 Mflops时达到了77.5％的TOP-1准确性；在CIFAR10上，它在不使用额外数据或转移学习的情况下达到了98.72％的TOP-1准确性。在Auto-GAN中，很少有NAS的表现优于先前发布的结果高达20％。广泛的实验表明，很少有NAS显着改善了各种单发方法，包括针对NASBENCH-201和NASBENCH1-SHOT-1的3种不同任务的4种基于梯度和6种基于搜索的方法。

Efficient evaluation of a network architecture drawn from a large search space remains a key challenge in Neural Architecture Search (NAS). Vanilla NAS evaluates each architecture by training from scratch, which gives the true performance but is extremely time-consuming. Recently, one-shot NAS substantially reduces the computation cost by training only one supernetwork, a.k.a. supernet, to approximate the performance of every architecture in the search space via weight-sharing. However, the performance estimation can be very inaccurate due to the co-adaption among operations. In this paper, we propose few-shot NAS that uses multiple supernetworks, called sub-supernet, each covering different regions of the search space to alleviate the undesired co-adaption. Compared to one-shot NAS, few-shot NAS improves the accuracy of architecture evaluation with a small increase of evaluation cost. With only up to 7 sub-supernets, few-shot NAS establishes new SoTAs: on ImageNet, it finds models that reach 80.5% top-1 accuracy at 600 MB FLOPS and 77.5% top-1 accuracy at 238 MFLOPS; on CIFAR10, it reaches 98.72% top-1 accuracy without using extra data or transfer learning. In Auto-GAN, few-shot NAS outperforms the previously published results by up to 20%. Extensive experiments show that few-shot NAS significantly improves various one-shot methods, including 4 gradient-based and 6 search-based methods on 3 different tasks in NasBench-201 and NasBench1-shot-1.

下载PDF全文

下载文献需遵守相关版权规定

论文标题