论文标题
随机runge-kutta方法和自适应SGD-G2随机梯度下降
Stochastic Runge-Kutta methods and adaptive SGD-G2 stochastic gradient descent
论文作者
论文摘要
在深层神经网络中,损失函数的最小化至关重要。另一方面,已经证明许多流行的优化算法对应于梯度流类型的某些进化方程。受到通用进化方程的数值方案的启发,我们引入了二阶随机runge kutta方法,并表明它为最小化损耗函数提供了一致的过程。此外,它可以在自适应框架中与随机梯度下降(SGD)耦合,以自动调整SGD的学习率,而无需有关损失功能的Hessian的任何其他信息。自适应SGD(称为SGD-G2)在标准数据集上成功测试。
The minimization of the loss function is of paramount importance in deep neural networks. On the other hand, many popular optimization algorithms have been shown to correspond to some evolution equation of gradient flow type. Inspired by the numerical schemes used for general evolution equations we introduce a second order stochastic Runge Kutta method and show that it yields a consistent procedure for the minimization of the loss function. In addition it can be coupled, in an adaptive framework, with a Stochastic Gradient Descent (SGD) to adjust automatically the learning rate of the SGD, without the need of any additional information on the Hessian of the loss functional. The adaptive SGD, called SGD-G2, is successfully tested on standard datasets.