通过降低差异，复制交换随机梯度MCMC的加速收敛性

论文标题

通过降低差异，复制交换随机梯度MCMC的加速收敛性

Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

论文作者

Deng, Wei, Feng, Qi, Karagiannis, Georgios, Lin, Guang, Liang, Faming

论文摘要

复制交换随机梯度Langevin Dynamics（RESGLD）在加速非convex学习中的收敛方面表现出了希望。但是，避免嘈杂能量估计器的偏见过多的校正限制了加速度的潜力。为了解决这个问题，我们研究了噪声能量估计器的差异，这促进了更有效的掉期。从理论上讲，我们对基础连续时间马尔可夫跳跃过程的指数加速度提供了非质合分析。此外，我们考虑了一种广义的Girsanov定理，其中包括基于Gröwall的不平等问题克服粗离散化的Poisson措施的更改，并在2-Wasserstein（$ \ Mathcal {w} _2 _2 $）中产生了更严格的错误。从数值上讲，我们进行了广泛的实验并获得最新的实验结果，从而对合成实验和图像数据进行优化和不确定性估计。

Replica exchange stochastic gradient Langevin dynamics (reSGLD) has shown promise in accelerating the convergence in non-convex learning; however, an excessively large correction for avoiding biases from noisy energy estimators has limited the potential of the acceleration. To address this issue, we study the variance reduction for noisy energy estimators, which promotes much more effective swaps. Theoretically, we provide a non-asymptotic analysis on the exponential acceleration for the underlying continuous-time Markov jump process; moreover, we consider a generalized Girsanov theorem which includes the change of Poisson measure to overcome the crude discretization based on the Gröwall's inequality and yields a much tighter error in the 2-Wasserstein ($\mathcal{W}_2$) distance. Numerically, we conduct extensive experiments and obtain the state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题