论文标题
两层神经网络模型的梯度下降动力学的淬火激活行为
The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models
论文作者
论文摘要
对于训练两层神经网络模型的梯度下降(GD)算法的数值和现象学研究,当目标功能可以通过相对较少的神经元准确地近似时,进行了不同的参数状态。 It is found that for Xavier-like initialization, there are two distinctive phases in the dynamic behavior of GD in the under-parametrized regime: An early phase in which the GD dynamics follows closely that of the corresponding random feature model and the neurons are effectively quenched, followed by a late phase in which the neurons are divided into two groups: a group of a few "activated" neurons that dominate the dynamics and a group of background (or支持持续激活和失活过程的“淬火”)神经元。这种类似神经网络的行为持续到温和的过度参数化制度,在那里它经历了向随机特征样行为的过渡。淬火激活过程似乎为“隐式正则化”提供了明确的机制。这与与“平均场”缩放率相关的动力学在质量上有所不同,在该缩放量表上所有神经元平均参与,并且在更改网络参数时似乎没有定性更改。
A numerical and phenomenological study of the gradient descent (GD) algorithm for training two-layer neural network models is carried out for different parameter regimes when the target function can be accurately approximated by a relatively small number of neurons. It is found that for Xavier-like initialization, there are two distinctive phases in the dynamic behavior of GD in the under-parametrized regime: An early phase in which the GD dynamics follows closely that of the corresponding random feature model and the neurons are effectively quenched, followed by a late phase in which the neurons are divided into two groups: a group of a few "activated" neurons that dominate the dynamics and a group of background (or "quenched") neurons that support the continued activation and deactivation process. This neural network-like behavior is continued into the mildly over-parametrized regime, where it undergoes a transition to a random feature-like behavior. The quenching-activation process seems to provide a clear mechanism for "implicit regularization". This is qualitatively different from the dynamics associated with the "mean-field" scaling where all neurons participate equally and there does not appear to be qualitative changes when the network parameters are changed.