论文标题
$β$ - 高分辨率的颂歌和NAG-SC和重球方法之间的相变
$β$-High Resolution ODE and Phase Transition between NAG-SC and Heavy Ball Method
论文作者
论文摘要
在本文中,我们研究了一种算法的收敛性能,该算法可以看作是两种基于梯度的优化方法之间的插值,即Nesterov的强烈凸功能函数的加速方法$(NAG $ - $ SC)$和Polyak的重球方法。在使用高分辨率的普通微分方程(ODE)方面取得了最新进展,以区分这两种根本不同的方法。它们之间的关键差异可以归因于梯度校正项,该梯度校正项在高分辨率ode中被Hessian项所反映。我们的目标是了解该术语如何影响融合率和步骤大小的选择。为了实现这一目标,我们介绍了$β$ - 高分辨率ode的概念,$ 0 \ leqβ\ leq 1 $,并证明在步骤大小的某些范围内,相位过渡发生在$β_C$。当$β_C\leqβ\ leq 1 $时,与$β$ - 高分辨率ODE相关的算法具有与NAG-SC相同的收敛速率。当$ 0 \ leqβ\ leqβ_c$时,该算法的收敛速率将比NAG-SC慢。
In this paper, we study the convergence properties of an algorithm that can be viewed as an interpolation between two gradient based optimization methods, Nesterov's acceleration method for strongly convex functions $(NAG$-$SC)$ and Polyak's heavy ball method. Recent Progress has been made on using High-Resolution ordinary differential equations (ODEs) to distinguish these two fundamentally different methods. The key difference between them can be attributed to the gradient correction term, which is reflected by the Hessian term in the High-Resolution ODE. Our goal is to understand how this term can affect the convergence rate and the choice of our step size. To achieve this goal, we introduce the notion of $β$-High Resolution ODE, $0\leq β\leq 1$ and prove that within certain range of step size, there is a phase transition happening at $β_c$. When $β_c\leqβ\leq 1$, the algorithm associated with $β$-High Resolution ODE have the same convergence rate as NAG-SC. When $0\leq β\leq β_c$, this algorithm will have the slower convergence rate than NAG-SC.