罗马：通过拓扑分解和梯度积累来鲁马稳健的记忆效率NAS

论文标题

罗马：通过拓扑分解和梯度积累来鲁马稳健的记忆效率NAS

ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and Gradient Accumulation

论文作者

Wang, Xiaoxing, Chu, Xiangxiang, Fan, Yuda, Zhang, Zhexi, Zhang, Bo, Yang, Xiaokang, Yan, Junchi

论文摘要

尽管是一种普遍的体系结构搜索方法，但可区分的体系结构搜索（飞镖）在很大程度上受到其大量内存成本的限制，因为整个超级网都位于内存中。这是单路飞镖进来的地方，每个步骤仅在每个步骤中选择一个单路s子模型。在友好型内存的同时，它也带来了低计算成本。但是，我们发现了一个尚未注意到的单路飞镖的关键问题。也就是说，它也遭受了严重的性能崩溃，因为像DARTS一样，过多的无参数操作（例如跳过连接）。在本文中，我们提出了一种新算法，称为鲁棒记忆效率的NAS（罗马）以提供治疗。首先，我们将拓扑搜索从操作搜索中解散，以使搜索和评估一致。然后，我们采用Gumbel-TOP2重新聚集和梯度积累来鲁棒性的双层优化。我们在15个基准中广泛验证罗马，以证明其有效性和鲁棒性。

Albeit being a prevalent architecture searching approach, differentiable architecture search (DARTS) is largely hindered by its substantial memory cost since the entire supernet resides in the memory. This is where the single-path DARTS comes in, which only chooses a single-path submodel at each step. While being memory-friendly, it also comes with low computational costs. Nonetheless, we discover a critical issue of single-path DARTS that has not been primarily noticed. Namely, it also suffers from severe performance collapse since too many parameter-free operations like skip connections are derived, just like DARTS does. In this paper, we propose a new algorithm called RObustifying Memory-Efficient NAS (ROME) to give a cure. First, we disentangle the topology search from the operation search to make searching and evaluation consistent. We then adopt Gumbel-Top2 reparameterization and gradient accumulation to robustify the unwieldy bi-level optimization. We verify ROME extensively across 15 benchmarks to demonstrate its effectiveness and robustness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题