合作受约束多代理增强学习（CMARL）的平均场近似

论文标题

合作受约束多代理增强学习（CMARL）的平均场近似

Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)

论文作者

Mondal, Washim Uddin, Aggarwal, Vaneet, Ukkusuri, Satish V.

论文摘要

最近已证明，平均场控制（MFC）是可扩展的工具，可大致解决大型多代理增强学习（MARL）问题。但是，这些研究通常仅限于无约束的累积奖励最大化框架。在本文中，我们表明，即使在存在约束的情况下，也可以使用MFC方法近似MARL问题。具体而言，我们证明，一个$ n $的约束MARL问题，状态和每个代理的动作空间分别是$ | \ Mathcal {x} | $，以及$ | \ m nathcal {u} | $，可以通过与错误的相关约束MFC问题近似，$ \ Mathcal {o} \ left（[\ sqrt {| \ Mathcal {x} |}+\ sqrt {| \ Mathcal {U} |}]/\ sqrt {n} \ right）$。在奖励，成本和状态过渡功能独立于人口的动作分布的特殊情况下，我们证明该错误可以将误差提高到$ e = \ Mathcal {o}（\ sqrt {| \ sqrt {| \ Mathcal {x}} |} |} |}/\ sqrt {n}）$。另外，我们提供了一种基于自然策略梯度的算法，并证明它可以在$ \ MATHCAL {O}（e）$的错误中以$ \ MATHCAL {o}（e^{ - 6}）的样本复杂性解决约束的MARL问题。

Mean-Field Control (MFC) has recently been proven to be a scalable tool to approximately solve large-scale multi-agent reinforcement learning (MARL) problems. However, these studies are typically limited to unconstrained cumulative reward maximization framework. In this paper, we show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints. Specifically, we prove that, an $N$-agent constrained MARL problem, with state, and action spaces of each individual agents being of sizes $|\mathcal{X}|$, and $|\mathcal{U}|$ respectively, can be approximated by an associated constrained MFC problem with an error, $e\triangleq \mathcal{O}\left([\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}]/\sqrt{N}\right)$. In a special case where the reward, cost, and state transition functions are independent of the action distribution of the population, we prove that the error can be improved to $e=\mathcal{O}(\sqrt{|\mathcal{X}|}/\sqrt{N})$. Also, we provide a Natural Policy Gradient based algorithm and prove that it can solve the constrained MARL problem within an error of $\mathcal{O}(e)$ with a sample complexity of $\mathcal{O}(e^{-6})$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题