论文标题
重新考虑合作多代理增强学习中的个人最大最大
Rethinking Individual Global Max in Cooperative Multi-Agent Reinforcement Learning
论文作者
论文摘要
在合作的多机构增强学习中,集中式培训和分散执行(CTDE)取得了巨大的成功。单个全球最大(IGM)分解是CTDE的重要组成部分,可以衡量本地策略和关节策略之间的一致性。大多数基于IGM的研究都集中于如何建立这种一致的关系,但是很少有人注意检查IGM的潜在缺陷。在这项工作中,我们揭示了IGM条件是有损分解,在基于超网络的方法中会累积有损分解的误差。为了解决上述问题,我们建议采用模仿学习策略,以将有损分解与贝尔曼的迭代区分开,从而避免误差积累。理论上,在零视图的情况下,在《星际争霸多代理挑战基准问题》中对拟议的策略进行了证明和经验验证。结果还证实,所提出的方法的表现优于最先进的基于IgM的方法。
In cooperative multi-agent reinforcement learning, centralized training and decentralized execution (CTDE) has achieved remarkable success. Individual Global Max (IGM) decomposition, which is an important element of CTDE, measures the consistency between local and joint policies. The majority of IGM-based research focuses on how to establish this consistent relationship, but little attention has been paid to examining IGM's potential flaws. In this work, we reveal that the IGM condition is a lossy decomposition, and the error of lossy decomposition will accumulated in hypernetwork-based methods. To address the above issue, we propose to adopt an imitation learning strategy to separate the lossy decomposition from Bellman iterations, thereby avoiding error accumulation. The proposed strategy is theoretically proved and empirically verified on the StarCraft Multi-Agent Challenge benchmark problem with zero sight view. The results also confirm that the proposed method outperforms state-of-the-art IGM-based approaches.