论文标题
近端随机梯度Langevin算法的原始双重解释
Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm
论文作者
论文摘要
我们考虑对数凹入概率分布进行采样的任务。假定目标分布的电位为复合\ textit {i.e。},写为平滑凸项的总和,而非滑凸项可能会占用无限值。目标分布可以看作是在Wasserstein空间上定义的Kullback-Leibler差异的最小化器(\ textit {i.e。},概率度量的空间)。在本文的第一部分中,我们为这个最小化问题建立了强大的双重性结果。在本文的第二部分中,我们使用第一部分引起的二元性差距来研究近端随机梯度Langevin算法(PSGLA)的复杂性,这可以看作是投影的Langevin算法的概括。我们的方法依赖于将PSGLA视为一种原始的双重算法,并涵盖了许多不完全支持目标分布的情况。特别是,我们表明,如果电势强烈凸出,则PSGLA的复杂性为$ O(1/\ Varepsilon^2)$在2-Wasserstein距离方面。相反,当电势是凸电势时,预测的langevin算法的复杂性为$ o(1/\ varepsilon^{12})$。
We consider the task of sampling with respect to a log concave probability distribution. The potential of the target distribution is assumed to be composite, \textit{i.e.}, written as the sum of a smooth convex term, and a nonsmooth convex term possibly taking infinite values. The target distribution can be seen as a minimizer of the Kullback-Leibler divergence defined on the Wasserstein space (\textit{i.e.}, the space of probability measures). In the first part of this paper, we establish a strong duality result for this minimization problem. In the second part of this paper, we use the duality gap arising from the first part to study the complexity of the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), which can be seen as a generalization of the Projected Langevin Algorithm. Our approach relies on viewing PSGLA as a primal dual algorithm and covers many cases where the target distribution is not fully supported. In particular, we show that if the potential is strongly convex, the complexity of PSGLA is $O(1/\varepsilon^2)$ in terms of the 2-Wasserstein distance. In contrast, the complexity of the Projected Langevin Algorithm is $O(1/\varepsilon^{12})$ in terms of total variation when the potential is convex.