论文标题

拜占庭率学习的分布势头

Distributed Momentum for Byzantine-resilient Learning

论文作者

El-Mhamdi, El-Mahdi, Guerraoui, Rachid, Rouault, Sébastien

论文摘要

动量是梯度下降的一种变体,它因其在收敛性方面的益处而提出。在分布式设置中,可以在服务器或工作者端实现动量。当服务器使用的聚合规则是线性的时,与加法的通勤性使得两个部署都等效。但是,鲁棒性和隐私是放弃线性聚合规则的动机。在这项工作中,我们证明了在工人方面使用动量的鲁棒性的好处。我们首先证明,工人的计算动量降低了服务器上梯度估计的方差比率,从而加强了拜占庭式弹性聚合规则。然后,我们提供了有关工人端动量对分布式SGD的鲁棒性作用的广泛实验证明。

Momentum is a variant of gradient descent that has been proposed for its benefits on convergence. In a distributed setting, momentum can be implemented either at the server or the worker side. When the aggregation rule used by the server is linear, commutativity with addition makes both deployments equivalent. Robustness and privacy are however among motivations to abandon linear aggregation rules. In this work, we demonstrate the benefits on robustness of using momentum at the worker side. We first prove that computing momentum at the workers reduces the variance-norm ratio of the gradient estimation at the server, strengthening Byzantine resilient aggregation rules. We then provide an extensive experimental demonstration of the robustness effect of worker-side momentum on distributed SGD.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源