在$ L_2 $正则化的深网的培训动力学上

论文标题

在$ L_2 $正则化的深网的培训动力学上

On the training dynamics of deep networks with $L_2$ regularization

论文作者

Lewkowycz, Aitor, Gur-Ari, Guy

论文摘要

我们研究了$ L_2 $正则化在深度学习中的作用，并发现模型的性能，$ L_2 $系数，学习率和培训步骤的数量之间的简单关系。当网络过度参数化时，这些经验关系就会成立。它们可用于预测给定模型的最佳正则参数。此外，基于这些观察结果，我们为正规化参数提出了动态时间表，以提高性能并加快训练的速度。我们在现代图像分类设置中测试这些建议。最后，我们表明这些经验关系可以在无限宽的网络的背景下从理论上理解。我们得出了此类网络的梯度流动动力学，并将$ L_2 $正则化的作用与线性模型的作用进行了比较。

We study the role of $L_2$ regularization in deep learning, and uncover simple relations between the performance of the model, the $L_2$ coefficient, the learning rate, and the number of training steps. These empirical relations hold when the network is overparameterized. They can be used to predict the optimal regularization parameter of a given model. In addition, based on these observations we propose a dynamical schedule for the regularization parameter that improves performance and speeds up training. We test these proposals in modern image classification settings. Finally, we show that these empirical relations can be understood theoretically in the context of infinitely wide networks. We derive the gradient flow dynamics of such networks, and compare the role of $L_2$ regularization in this context with that of linear models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题