轻巧的投影衍生物代码，用于压缩异步梯度下降

论文标题

轻巧的投影衍生物代码，用于压缩异步梯度下降

Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent

论文作者

Soto, Pedro, Ilmer, Ilia, Guan, Haibin, Li, Jun

论文摘要

编码的分布式计算已成为在大型数据集上执行梯度下降以减轻散乱者和其他故障的常见实践。本文提出了一种新型算法，该算法编码了部分导数本身，并通过对代码字上的衍生式代码字进行有损耗的压缩来优化代码，从而最大程度地提高代码字中包含的信息，同时最大程度地减少代码字之间的信息。在优化研究中观察到的事实是，在基于梯度下降的学习算法中，这是在优化研究中观察到的事实的几何后果，因为它有助于避免过度拟合和局部最小值。这与分布式编码计算的当前常规工作相反，该计算的重点是从工人那里恢复所有数据。第二个贡献是，编码方案的低重量性质允许进行异步梯度更新，因为该代码可以迭代解码。即，可以立即将工人的任务更新到更大的梯度中。方向导数始终是方向向量的线性函数。因此，我们的框架很健壮，因为它可以将线性编码技术应用于一般的机器学习框架，例如深神经网络。

Coded distributed computation has become common practice for performing gradient descent on large datasets to mitigate stragglers and other faults. This paper proposes a novel algorithm that encodes the partial derivatives themselves and furthermore optimizes the codes by performing lossy compression on the derivative codewords by maximizing the information contained in the codewords while minimizing the information between the codewords. The utility of this application of coding theory is a geometrical consequence of the observed fact in optimization research that noise is tolerable, sometimes even helpful, in gradient descent based learning algorithms since it helps avoid overfitting and local minima. This stands in contrast with much current conventional work on distributed coded computation which focuses on recovering all of the data from the workers. A second further contribution is that the low-weight nature of the coding scheme allows for asynchronous gradient updates since the code can be iteratively decoded; i.e., a worker's task can immediately be updated into the larger gradient. The directional derivative is always a linear function of the direction vectors; thus, our framework is robust since it can apply linear coding techniques to general machine learning frameworks such as deep neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题