论文标题

关于用于求解一致线性系统的异步SGD的收敛分析

On the Convergence Analysis of Asynchronous SGD for Solving Consistent Linear Systems

论文作者

Sahu, Atal Narayan, Dutta, Aritra, Tiwari, Aashutosh, Richtárik, Peter

论文摘要

在大数据和机器学习的领域中,数据平行的,分布式随机算法在当今引起了极大的关注。在本文中,我们提出和分析了一个{\它分布的异步平行} sgd,鉴于解决了任意一致的线性系统,通过将系统重新定义为Richtárik和takác​​在[35]中研究的随机优化问题中,通过将系统重新构建为随机优化问题。我们将异步SGD算法的收敛速率与Richtárik和Takáč在[35]中提出的同步平行算法[35]中的同步平行算法进行了比较 - 步骤大小,步骤尺寸,阻尼因子,处理器的数量和延迟因子。我们表明,我们的异步平行SGD算法也享有全局线性收敛速率,类似于[\ em Basic}方法和[35]中的同步平行方法,用于通过随机完整求解任何任意一致的线性系统。我们还表明,当处理器的数量大于四个时,我们的异步并行SGD在{\ em basic}方法上以更好的收敛速率改进。我们进一步表明,这种异步方法在某些线性系统上的渐近性能比其同步对应物更好。此外,对于某些线性系统,我们计算了我们的异步并行SGD所需的最小处理器数量,并且发现对于某些条件不足的问题,此数字可以低于两个。

In the realm of big data and machine learning, data-parallel, distributed stochastic algorithms have drawn significant attention in the present days.~While the synchronous versions of these algorithms are well understood in terms of their convergence, the convergence analyses of their asynchronous counterparts are not widely studied. In this paper, we propose and analyze a {\it distributed, asynchronous parallel} SGD in light of solving an arbitrary consistent linear system by reformulating the system into a stochastic optimization problem as studied by Richtárik and Takác in [35]. We compare the convergence rates of our asynchronous SGD algorithm with the synchronous parallel algorithm proposed by Richtárik and Takáč in [35] under different choices of the hyperparameters---the stepsize, the damping factor, the number of processors, and the delay factor. We show that our asynchronous parallel SGD algorithm also enjoys a global linear convergence rate, similar to the {\em basic} method and the synchronous parallel method in [35] for solving any arbitrary consistent linear system via stochastic reformulation. We also show that our asynchronous parallel SGD improves upon the {\em basic} method with a better convergence rate when the number of processors is larger than four. We further show that this asynchronous approach performs asymptotically better than its synchronous counterpart for certain linear systems. Moreover, for certain linear systems, we compute the minimum number of processors required for which our asynchronous parallel SGD is better, and find that this number can be as low as two for some ill-conditioned problems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源