SRKCD：一种稳定的Runge-Kutta方法，用于随机优化

论文标题

SRKCD：一种稳定的Runge-Kutta方法，用于随机优化

SRKCD: a stabilized Runge-Kutta method for stochastic optimization

论文作者

Stillfjord, Tony, Williamson, Måns

论文摘要

我们介绍了基于Runge-Kutta-Chebyshev（RKC）方案的随机优化方法家族。 RKC方法是明确的方法，最初是为了确保其稳定性区域的最大大小而设计的，用于解决僵硬的普通微分方程。在优化环境中，这允许更大的步骤尺寸（学习率）和更好的鲁棒性与例如。流行的随机梯度下降法。我们的主要贡献是基本上所有随机runge-kutta优化方法的收敛证明。这表明在强烈的凸度和Lipschitz-Connumen梯度的标准假设下，预期的收敛性和最佳均方根速率。对于非凸目标，我们在梯度的预期中将收敛性达到零。证明需要在runge-kutta系数上进行某些自然条件，我们进一步证明了RKC方案满足了这些方案。最后，我们通过在小规模的测试示例和机器学习中图像分类应用程序引起的问题上执行数值实验来说明方法在实践中的稳定性提高。

We introduce a family of stochastic optimization methods based on the Runge-Kutta-Chebyshev (RKC) schemes. The RKC methods are explicit methods originally designed for solving stiff ordinary differential equations by ensuring that their stability regions are of maximal size.In the optimization context, this allows for larger step sizes (learning rates) and better robustness compared to e.g. the popular stochastic gradient descent method. Our main contribution is a convergence proof for essentially all stochastic Runge-Kutta optimization methods. This shows convergence in expectation with an optimal sublinear rate under standard assumptions of strong convexity and Lipschitz-continuous gradients. For non-convex objectives, we get convergence to zero in expectation of the gradients. The proof requires certain natural conditions on the Runge-Kutta coefficients, and we further demonstrate that the RKC schemes satisfy these. Finally, we illustrate the improved stability properties of the methods in practice by performing numerical experiments on both a small-scale test example and on a problem arising from an image classification application in machine learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题