论文标题

$ O(s^r)$ - 分辨率的ode框架,用于了解离散时间算法和应用程序对最小问题的线性收敛

An $O(s^r)$-Resolution ODE Framework for Understanding Discrete-Time Algorithms and Applications to the Linear Convergence of Minimax Problems

论文作者

Lu, Haihao

论文摘要

使用普通微分方程(ODE)来了解离散时间算法(DTA)的动力学有很长的历史。令人惊讶的是,仍然存在两个基本和未解决的问题:(i)目前尚不清楚如何从给定的DTA中获得\ emph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {formph {poip)''在本文中,我们提出了一种新机械 - $ O(s^r)$ - 分辨率ode框架 - 用于分析通用DTA的行为,该行为(部分)回答了上述两个问题。该框架包含三个步骤:1。为了从给定的DTA获得合适的ODE,我们定义了$ o(s^r)$ - 分辨率的层次结构,该dta的dta odes由度量$ r $进行参数化,其中$ s $是dta的步进大小。我们提出了一种主要方法来构建DTA的唯一$ O(S^r)$ - 分辨率ODE; 2。为了分析所得的ode,我们提出了DTA相对于能量函数的$ O(s^r)$ - 线性连接条件,在此下,$ O(s^r)$ - 分辨率ode ode ode ode ode ode ode ode ode ode ode ode ode ode ode ode ode ode ode linearear temally to; 3。为了弥合DTA的收敛性及其相应的ODES,我们定义了能量函数的适当性,并表明$ O(S^r)$ - 分辨率相对于适当的能量函数的线性收敛可以自动保证DTA的线性收敛性。为了更好地说明这种机械,我们利用它来研究三种经典算法 - 梯度下降(GDA),近端方法方法(PPM)和额外差异方法(EGM) - 求解无约束的最小值问题$ \ min_ {x \ in \ rr^n} n} \ rr^n} \ max y} y y y y y \ y \ y \ y \ y \ y rr y rr y rr y rr y \ in y \ in y \ in y rr n rr.

There has been a long history of using ordinary differential equations (ODEs) to understand the dynamics of discrete-time algorithms (DTAs). Surprisingly, there are still two fundamental and unanswered questions: (i) it is unclear how to obtain a \emph{suitable} ODE from a given DTA, and (ii) it is unclear the connection between the convergence of a DTA and its corresponding ODEs. In this paper, we propose a new machinery -- an $O(s^r)$-resolution ODE framework -- for analyzing the behavior of a generic DTA, which (partially) answers the above two questions. The framework contains three steps: 1. To obtain a suitable ODE from a given DTA, we define a hierarchy of $O(s^r)$-resolution ODEs of a DTA parameterized by the degree $r$, where $s$ is the step-size of the DTA. We present a principal approach to construct the unique $O(s^r)$-resolution ODEs from a DTA; 2. To analyze the resulting ODE, we propose the $O(s^r)$-linear-convergence condition of a DTA with respect to an energy function, under which the $O(s^r)$-resolution ODE converges linearly to an optimal solution; 3. To bridge the convergence properties of a DTA and its corresponding ODEs, we define the properness of an energy function and show that the linear convergence of the $O(s^r)$-resolution ODE with respect to a proper energy function can automatically guarantee the linear convergence of the DTA. To better illustrate this machinery, we utilize it to study three classic algorithms -- gradient descent ascent (GDA), proximal point method (PPM) and extra-gradient method (EGM) -- for solving the unconstrained minimax problem $\min_{x\in\RR^n} \max_{y\in \RR^m} L(x,y)$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源