Wasserstein分布的一阶方法在强大的MDP上

论文标题

Wasserstein分布的一阶方法在强大的MDP上

First-Order Methods for Wasserstein Distributionally Robust MDP

论文作者

Grand-Clément, Julien, Kroer, Christian

论文摘要

已知马尔可夫决策过程（MDP）对参数规范很敏感。通过允许\ emph {歧义集}来减轻分布强大的MDP，从而减轻了此问题，这些{歧义集}通过参数集给出了一组可能的分布。目标是找到有关最坏情况参数分布的最佳策略。我们提出了一个框架，用于通过一阶方法求解分布鲁棒的MDP，并将其实例化，以实现几种类型的Wasserstein歧义集。通过开发有效的近端更新，我们的算法达到了$ o \ left（Na^{2.5} s^{3.5} {3.5} \ log（s）\ log（ε^{ - 1}）ε^{ - 1.5} \ right（ε^{ - 1}）$ s $ n $ s $ s $ s $ s $ s $ s $ s $ s $ s $ s $ s $ s $ s;根据Wasserstein设置，此速率略有不同。我们对$ n，$和$ s $的依赖性明显优于现有方法，而现有方法的复杂性为$ o \ left（n^{3.5} a^{3.5} s^{4.5} \ log^{2} \ log^{2}（ε^{ - 1}）\ right）\ right）$。数值实验表明，我们的算法比几个领域的最先进方法明显更可扩展。

Markov decision processes (MDPs) are known to be sensitive to parameter specification. Distributionally robust MDPs alleviate this issue by allowing for \emph{ambiguity sets} which give a set of possible distributions over parameter sets. The goal is to find an optimal policy with respect to the worst-case parameter distribution. We propose a framework for solving Distributionally robust MDPs via first-order methods, and instantiate it for several types of Wasserstein ambiguity sets. By developing efficient proximal updates, our algorithms achieve a convergence rate of $O\left(NA^{2.5}S^{3.5}\log(S)\log(ε^{-1})ε^{-1.5} \right)$ for the number of kernels $N$ in the support of the nominal distribution, states $S$, and actions $A$; this rate varies slightly based on the Wasserstein setup. Our dependence on $N,A$ and $S$ is significantly better than existing methods, which have a complexity of $O\left(N^{3.5}A^{3.5}S^{4.5}\log^{2}(ε^{-1}) \right)$. Numerical experiments show that our algorithm is significantly more scalable than state-of-the-art approaches across several domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题