论文标题
浅神经网络的动态中心限制定理
A Dynamical Central Limit Theorem for Shallow Neural Networks
论文作者
论文摘要
最近的理论工作表征了当宽度趋向于无穷大时,通过梯度下降训练的较浅神经网络的动力学。在初始化时,参数的随机抽样导致与经典中心极限定理(CLT)决定的平均场极限的偏差。但是,由于梯度下降会引起参数之间的相关性,因此分析这些波动的发展是很有趣的。在这里,我们使用动力学CLT来证明平均极限周围的渐近波动在整个训练中均保持均方根。上界由蒙特卡洛重新采样误差给出,其方差取决于基础度量的2个字符,这也控制了概括误差。这激发了在培训期间使用此2-Norm作为正规化术语的使用。此外,如果平均场动力学收敛到插值训练数据的度量,我们证明渐近偏差最终会在CLT缩放中消失。我们还通过数值实验补充了这些结果。
Recent theoretical works have characterized the dynamics of wide shallow neural networks trained via gradient descent in an asymptotic mean-field limit when the width tends towards infinity. At initialization, the random sampling of the parameters leads to deviations from the mean-field limit dictated by the classical Central Limit Theorem (CLT). However, since gradient descent induces correlations among the parameters, it is of interest to analyze how these fluctuations evolve. Here, we use a dynamical CLT to prove that the asymptotic fluctuations around the mean limit remain bounded in mean square throughout training. The upper bound is given by a Monte-Carlo resampling error, with a variance that that depends on the 2-norm of the underlying measure, which also controls the generalization error. This motivates the use of this 2-norm as a regularization term during training. Furthermore, if the mean-field dynamics converges to a measure that interpolates the training data, we prove that the asymptotic deviation eventually vanishes in the CLT scaling. We also complement these results with numerical experiments.