随机runge-kutta方法和自适应SGD-G2随机梯度下降

论文标题

随机runge-kutta方法和自适应SGD-G2随机梯度下降

Stochastic Runge-Kutta methods and adaptive SGD-G2 stochastic gradient descent

论文作者

Ayadi, Imen, Turinici, Gabriel

论文摘要

在深层神经网络中，损失函数的最小化至关重要。另一方面，已经证明许多流行的优化算法对应于梯度流类型的某些进化方程。受到通用进化方程的数值方案的启发，我们引入了二阶随机runge kutta方法，并表明它为最小化损耗函数提供了一致的过程。此外，它可以在自适应框架中与随机梯度下降（SGD）耦合，以自动调整SGD的学习率，而无需有关损失功能的Hessian的任何其他信息。自适应SGD（称为SGD-G2）在标准数据集上成功测试。

The minimization of the loss function is of paramount importance in deep neural networks. On the other hand, many popular optimization algorithms have been shown to correspond to some evolution equation of gradient flow type. Inspired by the numerical schemes used for general evolution equations we introduce a second order stochastic Runge Kutta method and show that it yields a consistent procedure for the minimization of the loss function. In addition it can be coupled, in an adaptive framework, with a Stochastic Gradient Descent (SGD) to adjust automatically the learning rate of the SGD, without the need of any additional information on the Hessian of the loss functional. The adaptive SGD, called SGD-G2, is successfully tested on standard datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题