与合作MAL的Q学习的平均场控制：收敛性和复杂性分析

论文标题

与合作MAL的Q学习的平均场控制：收敛性和复杂性分析

Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis

论文作者

Gu, Haotian, Guo, Xin, Wei, Xiaoli, Xu, Renyuan

论文摘要

尽管具有知名度和经验成功，但多代理增强学习（MARL）仍受到维数的诅咒。本文通过平均场控制（MFC）方法构建数学框架以近似合作MARL，并表明近似错误为$ \ Mathcal {o}（\ frac {1} {\ sqrt {n}}}）$。通过为值函数和Q函数建立适当的动态编程原理形式，它提出了一种基于无模型内核的Q学习算法（MFC-K-Q），该算法（MFC-K-Q）被证明是MFC问题的线性收敛速率，这是MARL文学中同类的第一个。它进一步确定了MFC-K-Q的收敛速率和样本复杂性独立于代理$ n $的数量，该代理的数量提供了$ \ Mathcal {o}（\ frac {1} {\ sqrt {n}} {n}} {n}}）$ n $ agents $ n $ agents $ n $ agenter在学习环境中的问题。网络交通拥堵问题的实证研究表明，当$ n $大时，MFC-K-Q优于现有的MARL算法，例如$ n> 50 $。

Multi-agent reinforcement learning (MARL), despite its popularity and empirical success, suffers from the curse of dimensionality. This paper builds the mathematical framework to approximate cooperative MARL by a mean-field control (MFC) approach, and shows that the approximation error is of $\mathcal{O}(\frac{1}{\sqrt{N}})$. By establishing an appropriate form of the dynamic programming principle for both the value function and the Q function, it proposes a model-free kernel-based Q-learning algorithm (MFC-K-Q), which is shown to have a linear convergence rate for the MFC problem, the first of its kind in the MARL literature. It further establishes that the convergence rate and the sample complexity of MFC-K-Q are independent of the number of agents $N$, which provides an $\mathcal{O}(\frac{1}{\sqrt{N}})$ approximation to the MARL problem with $N$ agents in the learning environment. Empirical studies for the network traffic congestion problem demonstrate that MFC-K-Q outperforms existing MARL algorithms when $N$ is large, for instance when $N>50$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题