Scaleenet：搜索模型以扩展

论文标题

Scaleenet：搜索模型以扩展

ScaleNet: Searching for the Model to Scale

论文作者

Xie, Jiyang, Su, Xiu, You, Shan, Ma, Zhanyu, Wang, Fei, Qian, Chen

论文摘要

最近，社区对模型缩放的关注越来越多，并有助于开发具有广泛尺度的模型家族。当前的方法要么简单地采用单发NAS的方式来构建非结构性和不可缩小的模型家族，要么依靠手动而固定的缩放策略来扩展不必要的最佳基础模型。在本文中，我们桥接了两个组件，并将Scalenet提出到共同搜索基础模型和缩放策略，以便缩放大型模型可以具有更有希望的性能。具体而言，我们设计了一个超级植物，以体现具有不同尺寸频谱的模型（例如，拖鞋）。然后，可以通过基于马尔可夫链的进化算法与基本模型进行交互性学习缩放策略，并概括以开发更大的模型。为了获得一个体面的超级植物，我们设计了一种分层抽样策略，以增强其训练充足并减轻干扰。实验结果表明，我们的缩放网络在各种拖船上具有显着的性能优势，但搜索成本至少降低了2.53倍。代码可在https://github.com/luminolx/scalenet上找到。

Recently, community has paid increasing attention on model scaling and contributed to developing a model family with a wide spectrum of scales. Current methods either simply resort to a one-shot NAS manner to construct a non-structural and non-scalable model family or rely on a manual yet fixed scaling strategy to scale an unnecessarily best base model. In this paper, we bridge both two components and propose ScaleNet to jointly search base model and scaling strategy so that the scaled large model can have more promising performance. Concretely, we design a super-supernet to embody models with different spectrum of sizes (e.g., FLOPs). Then, the scaling strategy can be learned interactively with the base model via a Markov chain-based evolution algorithm and generalized to develop even larger models. To obtain a decent super-supernet, we design a hierarchical sampling strategy to enhance its training sufficiency and alleviate the disturbance. Experimental results show our scaled networks enjoy significant performance superiority on various FLOPs, but with at least 2.53x reduction on search cost. Codes are available at https://github.com/luminolx/ScaleNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题