线性收敛算法，并降低了分布式随机优化的方差

论文标题

线性收敛算法，并降低了分布式随机优化的方差

Linearly Convergent Algorithm with Variance Reduction for Distributed Stochastic Optimization

论文作者

Lei, Jinlong, Yi, Peng, Chen, Jie, Hong, Yiguang

论文摘要

本文认为分布式随机的强烈凸优化，在网络上连接的代理旨在合作最大程度地减少所有代理的局部成本功能的平均值。由于梯度估计的随机性和局部客观的分布性，尚未实现快速线性收敛的分布算法。这项工作提出了一种新型的分布式随机梯度跟踪算法，并降低了方差，其中局部梯度是通过采样梯度的批量批量估计的。使用无方向的连接通信图和几何增加的批次大小，迭代元素显示出以几何速率（实现线性收敛）的平均收敛到最佳解决方案。还建立了用于获得$ε$最佳解决方案的迭代，通信和甲骨文复杂性。特定的，通信复杂性为$ \ MATHCAL {O}（\ ln（1/ε））$，而甲骨文复杂性（采样梯度的数量）为$ \ Mathcal {o}（1/ε^2）$，这与集中式方法的顺序相同。因此，所提出的方案在不需要额外的采样梯度的情况下进行沟通效率。给出数值模拟以证明理论结果。

This paper considers a distributed stochastic strongly convex optimization, where agents connected over a network aim to cooperatively minimize the average of all agents' local cost functions. Due to the stochasticity of gradient estimation and distributedness of local objective, fast linearly convergent distributed algorithms have not been achieved yet. This work proposes a novel distributed stochastic gradient tracking algorithm with variance reduction, where the local gradients are estimated by an increasing batch-size of sampled gradients. With an undirected connected communication graph and a geometrically increasing batch-size, the iterates are shown to converge in mean to the optimal solution at a geometric rate (achieving linear convergence). The iteration, communication, and oracle complexity for obtaining an $ε$-optimal solution are established as well. Particulary, the communication complexity is $\mathcal{O}(\ln (1/ε))$ while the oracle complexity (number of sampled gradients) is $\mathcal{O}(1/ε^2)$, which is of the same order as that of centralized approaches. Hence, the proposed scheme is communication-efficient without requiring extra sampled gradients. Numerical simulations are given to demonstrate the theoretic results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题