深度结构连接性与其融合有关：细粒度分析

论文标题

深度结构连接性与其融合有关：细粒度分析

Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

论文作者

Chen, Wuyang, Huang, Wei, Gong, Xinyu, Hanin, Boris, Wang, Zhangyang

论文摘要

由人类或汽车算法设计的先进深层神经网络（DNNS）的日益复杂。各种操作通过复杂的连接模式（例如各种类型的跳过连接）连接。这些拓扑成分在经验上是有效的，并观察到了损失景观的平滑并促进梯度流。但是，仍然难以捉摸地得出对其对DNN能力或训练能力的影响的任何原则理解，并了解为什么或在哪个方面某种方面的特定连接模式比另一种方面更好。在这项工作中，我们从理论上表征了连通性模式对细粒度梯度下降训练下DNN的收敛的影响。通过分析广泛的网络的神经网络高斯过程（NNGP），我们能够描述NNGP内核的频谱如何通过特定的连接模式传播，以及如何影响收敛速率的边界。作为我们结果的一个实际含义，我们表明，通过简单的过滤“无主张”的连接模式，我们可以缩小评估模型的数量，并显着加速大规模的神经体系结构搜索，而无需任何开销。代码可在以下网址提供：https：//github.com/vita-group/architecture_convergence。

Advanced deep neural networks (DNNs), designed by either human or AutoML algorithms, are growing increasingly complex. Diverse operations are connected by complicated connectivity patterns, e.g., various types of skip connections. Those topological compositions are empirically effective and observed to smooth the loss landscape and facilitate the gradient flow in general. However, it remains elusive to derive any principled understanding of their effects on the DNN capacity or trainability, and to understand why or in which aspect one specific connectivity pattern is better than another. In this work, we theoretically characterize the impact of connectivity patterns on the convergence of DNNs under gradient descent training in fine granularity. By analyzing a wide network's Neural Network Gaussian Process (NNGP), we are able to depict how the spectrum of an NNGP kernel propagates through a particular connectivity pattern, and how that affects the bound of convergence rates. As one practical implication of our results, we show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate, and significantly accelerate the large-scale neural architecture search without any overhead. Code is available at: https://github.com/VITA-Group/architecture_convergence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题