论文标题
随机网络系统中的多代理增强学习
Multi-Agent Reinforcement Learning in Stochastic Networked Systems
论文作者
论文摘要
我们在随机代理网络中研究多代理增强学习(MARL)。目的是找到最大化(折扣)全球奖励的本地政策。通常,在这种情况下,可伸缩性是一个挑战,因为全球状态/动作空间的大小可能在代理数量中是指数的。可扩展算法仅在依赖性是静态,固定和局部的情况下才知道,例如,在固定的,时间不变的基础图中的邻居之间。在这项工作中,我们提出了一个可扩展的Actor评论家框架,该框架适用于依赖项可能是非本地和随机性的设置,并提供有限的时间误差,该误差绑定了,该误差显示了收敛率如何取决于网络中信息的速度。此外,作为我们分析的副产品,我们为一般的随机近似方案以及使用状态聚集的时间差学习获得了新的有限时间收敛结果,这在网络系统中适用于MARL的设置。
We study multi-agent reinforcement learning (MARL) in a stochastic network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are static, fixed and local, e.g., between neighbors in a fixed, time-invariant underlying graph. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies can be non-local and stochastic, and provide a finite-time error bound that shows how the convergence rate depends on the speed of information spread in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation, which apply beyond the setting of MARL in networked systems.