在贝尔曼的最佳原理上

论文标题

在贝尔曼的最佳原理上

On Bellman's Optimality Principle for zs-POSGs

论文作者

Buffet, Olivier, Dibangoye, Jilles, Delage, Aurélien, Saffidine, Abdallah, Thomas, Vincent

论文摘要

许多非平凡的顺序决策问题通过依靠贝尔曼的最佳原理，即利用子问题在原始问题中递归嵌套的事实来有效解决。在这里，我们展示了如何通过（i）采取中央计划者的观点来应用于（无限的地平线）2播放器零和部分可观察到的随机游戏（zs-posgs），这只能在足够的统计范围内称为占用状态，以及（ii）将此类问题转换为零sum的占用占用占用者Markov Games（ZS-ZS-OMGS）。然后，利用在占用空间中值函数的Lipschitz-continution，可以得出HSVI算法的版本（启发式搜索值迭代），该版本可证明在有限的时间内发现了$ε$ -NASH平衡。

Many non-trivial sequential decision-making problems are efficiently solved by relying on Bellman's optimality principle, i.e., exploiting the fact that sub-problems are nested recursively within the original problem. Here we show how it can apply to (infinite horizon) 2-player zero-sum partially observable stochastic games (zs-POSGs) by (i) taking a central planner's viewpoint, which can only reason on a sufficient statistic called occupancy state, and (ii) turning such problems into zero-sum occupancy Markov games (zs-OMGs). Then, exploiting the Lipschitz-continuity of the value function in occupancy space, one can derive a version of the HSVI algorithm (Heuristic Search Value Iteration) that provably finds an $ε$-Nash equilibrium in finite time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题