论文标题
Wasserstein分布的一阶方法在强大的MDP上
First-Order Methods for Wasserstein Distributionally Robust MDP
论文作者
论文摘要
已知马尔可夫决策过程(MDP)对参数规范很敏感。通过允许\ emph {歧义集}来减轻分布强大的MDP,从而减轻了此问题,这些{歧义集}通过参数集给出了一组可能的分布。目标是找到有关最坏情况参数分布的最佳策略。我们提出了一个框架,用于通过一阶方法求解分布鲁棒的MDP,并将其实例化,以实现几种类型的Wasserstein歧义集。通过开发有效的近端更新,我们的算法达到了$ o \ left(Na^{2.5} s^{3.5} {3.5} \ log(s)\ log(ε^{ - 1})ε^{ - 1.5} \ right(ε^{ - 1})$ s $ n $ s $ s $ s $ s $ s $ s $ s $ s $ s $ s $ s $ s $ s;根据Wasserstein设置,此速率略有不同。我们对$ n,$和$ s $的依赖性明显优于现有方法,而现有方法的复杂性为$ o \ left(n^{3.5} a^{3.5} s^{4.5} \ log^{2} \ log^{2}(ε^{ - 1})\ right)\ right)$。数值实验表明,我们的算法比几个领域的最先进方法明显更可扩展。
Markov decision processes (MDPs) are known to be sensitive to parameter specification. Distributionally robust MDPs alleviate this issue by allowing for \emph{ambiguity sets} which give a set of possible distributions over parameter sets. The goal is to find an optimal policy with respect to the worst-case parameter distribution. We propose a framework for solving Distributionally robust MDPs via first-order methods, and instantiate it for several types of Wasserstein ambiguity sets. By developing efficient proximal updates, our algorithms achieve a convergence rate of $O\left(NA^{2.5}S^{3.5}\log(S)\log(ε^{-1})ε^{-1.5} \right)$ for the number of kernels $N$ in the support of the nominal distribution, states $S$, and actions $A$; this rate varies slightly based on the Wasserstein setup. Our dependence on $N,A$ and $S$ is significantly better than existing methods, which have a complexity of $O\left(N^{3.5}A^{3.5}S^{4.5}\log^{2}(ε^{-1}) \right)$. Numerical experiments show that our algorithm is significantly more scalable than state-of-the-art approaches across several domains.