论文标题

在贝尔曼的最佳原理上

On Bellman's Optimality Principle for zs-POSGs

论文作者

Buffet, Olivier, Dibangoye, Jilles, Delage, Aurélien, Saffidine, Abdallah, Thomas, Vincent

论文摘要

许多非平凡的顺序决策问题通过依靠贝尔曼的最佳原理,即利用子问题在原始问题中递归嵌套的事实来有效解决。在这里,我们展示了如何通过(i)采取中央计划者的观点来应用于(无限的地平线)2播放器零和部分可观察到的随机游戏(zs-posgs),这只能在足够的统计范围内称为占用状态,以及(ii)将此类问题转换为零sum的占用占用占用者Markov Games(ZS-ZS-OMGS)。然后,利用在占用空间中值函数的Lipschitz-continution,可以得出HSVI算法的版本(启发式搜索值迭代),该版本可证明在有限的时间内发现了$ε$ -NASH平衡。

Many non-trivial sequential decision-making problems are efficiently solved by relying on Bellman's optimality principle, i.e., exploiting the fact that sub-problems are nested recursively within the original problem. Here we show how it can apply to (infinite horizon) 2-player zero-sum partially observable stochastic games (zs-POSGs) by (i) taking a central planner's viewpoint, which can only reason on a sufficient statistic called occupancy state, and (ii) turning such problems into zero-sum occupancy Markov games (zs-OMGs). Then, exploiting the Lipschitz-continuity of the value function in occupancy space, one can derive a version of the HSVI algorithm (Heuristic Search Value Iteration) that provably finds an $ε$-Nash equilibrium in finite time.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源