多代理增强学习中的图形卷积价值分解

论文标题

多代理增强学习中的图形卷积价值分解

Graph Convolutional Value Decomposition in Multi-Agent Reinforcement Learning

论文作者

Naderializadeh, Navid, Hung, Fan H., Soleyman, Sean, Khosla, Deepak

论文摘要

我们提出了一个新的框架，用于使用图神经网络（GNNS）中多代理深钢筋学习（MARL）中的价值函数分解。特别是，我们将代理团队视为完整的有向图的节点的集合，其边缘权重由注意机制支配。在此基础图的基础上，我们引入了一个混合GNN模块，该模块负责i）将团队国家行动价值函数分配到单独的每个代理观察行动价值函数中，ii）根据全球团队奖励的分数，将明确的信用分配给每个代理。我们称为GraphMix的方法遵循集中式培训和分散的执行范式，使代理商能够在培训完成后独立做出决定。我们在Starcraft II Multi-Agent Challenge（SMAC）基准中的几种情况下，与最新场景相比，我们显示了GraphMix的优势。我们进一步证明了如何将GraphMix与最新的分层MARL体系结构一起使用，以改善代理的性能并在具有较高数量的试剂和/或动作的不匹配的测试场景上进行微调。

We propose a novel framework for value function factorization in multi-agent deep reinforcement learning (MARL) using graph neural networks (GNNs). In particular, we consider the team of agents as the set of nodes of a complete directed graph, whose edge weights are governed by an attention mechanism. Building upon this underlying graph, we introduce a mixing GNN module, which is responsible for i) factorizing the team state-action value function into individual per-agent observation-action value functions, and ii) explicit credit assignment to each agent in terms of fractions of the global team reward. Our approach, which we call GraphMIX, follows the centralized training and decentralized execution paradigm, enabling the agents to make their decisions independently once training is completed. We show the superiority of GraphMIX as compared to the state-of-the-art on several scenarios in the StarCraft II multi-agent challenge (SMAC) benchmark. We further demonstrate how GraphMIX can be used in conjunction with a recent hierarchical MARL architecture to both improve the agents' performance and enable fine-tuning them on mismatched test scenarios with higher numbers of agents and/or actions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题