论文标题
多层神经网络的平均场限制的严格框架
A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks
论文作者
论文摘要
我们为平均野外制度中的多层神经网络开发了数学上严格的框架。随着网络的宽度的增加,网络的学习轨迹被证明是由有意义且动态的非线性限制(\ textIt {Mean fiel Field} limit)很好地捕获的,该限制以ODES系统为特征。我们的框架适用于广泛的网络体系结构,学习动态和网络初始化。框架的核心是\ textit {neuronal嵌入}的新想法,它由一个非进化的概率空间组成,该空间允许嵌入任意宽度的神经网络。 使用我们的框架,我们证明了大宽度多层神经网络的几个属性。首先,我们表明,当网络的深度至少四个时,独立和相同分布的初始化会对网络的学习轨迹产生强大的退化效果。其次,在许多不同的设置下,我们获得了馈电多层网络的几种全局收敛保证。这些包括具有独立且分布相同初始化的两层和三层网络,以及具有特殊类型相关初始化的任意深度的多层网络,这些网络是由\ textit {双向多样性}的新概念激发的。与以前依赖凸度的作品不同,我们的结果承认非凸损失并在某个通用近似属性上取决于无限宽度神经网络的独特特征,并且显示在整个训练过程中均能持有。除了是在平均野外制度中多层网络全球融合的首个已知结果外,它们还展示了我们框架的灵活性,并结合了几种与常规凸优化智慧不同的新想法和见解。
We develop a mathematically rigorous framework for multilayer neural networks in the mean field regime. As the network's widths increase, the network's learning trajectory is shown to be well captured by a meaningful and dynamically nonlinear limit (the \textit{mean field} limit), which is characterized by a system of ODEs. Our framework applies to a broad range of network architectures, learning dynamics and network initializations. Central to the framework is the new idea of a \textit{neuronal embedding}, which comprises of a non-evolving probability space that allows to embed neural networks of arbitrary widths. Using our framework, we prove several properties of large-width multilayer neural networks. Firstly we show that independent and identically distributed initializations cause strong degeneracy effects on the network's learning trajectory when the network's depth is at least four. Secondly we obtain several global convergence guarantees for feedforward multilayer networks under a number of different setups. These include two-layer and three-layer networks with independent and identically distributed initializations, and multilayer networks of arbitrary depths with a special type of correlated initializations that is motivated by the new concept of \textit{bidirectional diversity}. Unlike previous works that rely on convexity, our results admit non-convex losses and hinge on a certain universal approximation property, which is a distinctive feature of infinite-width neural networks and is shown to hold throughout the training process. Aside from being the first known results for global convergence of multilayer networks in the mean field regime, they demonstrate flexibility of our framework and incorporate several new ideas and insights that depart from the conventional convex optimization wisdom.