论文标题

通过替代草图和缩放正则化的分布式二阶优化分布式二阶优化

Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization

论文作者

Dereziński, Michał, Bartan, Burak, Pilanci, Mert, Mahoney, Michael W.

论文摘要

在分布式二阶优化中,标准策略是平均许多本地估计,每个估计都基于一小部分草图或一批数据。但是,相对于所有数据的完整解决方案,每台机器上的本地估计值通常是偏差的,这可能会限制平均的有效性。在这里,我们介绍了一种新技术,用于对局部估计进行辩护,从而导致分布式二阶方法的收敛速率的理论和经验提高。我们的技术有两个新颖的组成部分:(1)修改标准草图技术以获取我们所谓的替代素描; (2)仔细缩放本地计算的全局正则参数。我们的替代草图基于确定点过程,这是一个分布家族,可以准确地计算出逆性黑石的估计值。基于此计算,我们表明,当被最小化为$ l_2 $的目标用参数$λ$进行了$ l_2 $,并且每个机器的大小$ m $的草图,则为了消除偏见,应使用$λ^{\ prime} =λ\ cdot($ cdot}(d_ freac}}的缩减正则化参数计算本地估计。 $D_λ$是Hessian的$λ$有效尺寸(或对于二次问题,数据矩阵)。

In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data. However, the local estimates on each machine are typically biased, relative to the full solution on all of the data, and this can limit the effectiveness of averaging. Here, we introduce a new technique for debiasing the local estimates, which leads to both theoretical and empirical improvements in the convergence rate of distributed second order methods. Our technique has two novel components: (1) modifying standard sketching techniques to obtain what we call a surrogate sketch; and (2) carefully scaling the global regularization parameter for local computations. Our surrogate sketches are based on determinantal point processes, a family of distributions for which the bias of an estimate of the inverse Hessian can be computed exactly. Based on this computation, we show that when the objective being minimized is $l_2$-regularized with parameter $λ$ and individual machines are each given a sketch of size $m$, then to eliminate the bias, local estimates should be computed using a shrunk regularization parameter given by $λ^{\prime}=λ\cdot(1-\frac{d_λ}{m})$, where $d_λ$ is the $λ$-effective dimension of the Hessian (or, for quadratic problems, the data matrix).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源