批处理更新方法与近似梯度和/或嘈杂测量的收敛：理论和计算结果

论文标题

批处理更新方法与近似梯度和/或嘈杂测量的收敛：理论和计算结果

Convergence of Batch Updating Methods with Approximate Gradients and/or Noisy Measurements: Theory and Computational Results

论文作者

Reddy, Tadipatri Uday Kiran, Vidyasagar, M.

论文摘要

在本文中，我们提出了一个统一的一般框架，用于分析非线性，高维优化的批处理更新方法。该框架涵盖了所有当前使用的批处理更新方法，并且适用于非convex以及凸功能。此外，该框架允许使用噪声浪费的梯度以及梯度的一阶近似值（有时称为“无梯度”方法）。通过将迭代的分析视为随机过程收敛性的问题，我们能够建立一个非常通用的定理，其中包括零级和一阶方法的最著名的收敛结果。对“二阶”或基于动量的方法的分析不是本文的一部分，并且将在其他地方进行研究。但是，数值实验表明，如果真正的梯度被其一阶近似代替，基于动量的方法可能会失败。这需要进一步的理论分析。

In this paper, we present a unified and general framework for analyzing the batch updating approach to nonlinear, high-dimensional optimization. The framework encompasses all the currently used batch updating approaches, and is applicable to nonconvex as well as convex functions. Moreover, the framework permits the use of noise-corrupted gradients, as well as first-order approximations to the gradient (sometimes referred to as "gradient-free" approaches). By viewing the analysis of the iterations as a problem in the convergence of stochastic processes, we are able to establish a very general theorem, which includes most known convergence results for zeroth-order and first-order methods. The analysis of "second-order" or momentum-based methods is not a part of this paper, and will be studied elsewhere. However, numerical experiments indicate that momentum-based methods can fail if the true gradient is replaced by its first-order approximation. This requires further theoretical analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题