多种稀疏奖励加强学习的合作图方法

论文标题

多种稀疏奖励加强学习的合作图方法

A Cooperation Graph Approach for Multiagent Sparse Reward Reinforcement Learning

论文作者

Fu, Qingxu, Qiu, Tenghai, Pu, Zhiqiang, Yi, Jianqiang, Yuan, Wanmai

论文摘要

多基础增强学习（MARL）可以解决复杂的合作任务。但是，现有的MAL方法的效率在很大程度上取决于明确定义的奖励功能。具有稀疏奖励反馈的多项式任务尤其具有挑战性，这不仅是由于信用分配问题，而且还因为获得积极的奖励反馈的可能性很低。在本文中，我们设计了一个称为合作图（CG）的图形网络。合作图是两个简单的二分子图的组合，即代理聚类子图（ACG）和指定子图（CDG）的群集。接下来，基于这种新颖的图形结构，我们提出了一个合作图多力增强学习（CG-MARL）算法，该算法可以有效地处理多基因任务中稀疏的奖励问题。在CG-MARL中，代理由合作图直接控制。并且对政策神经网络进行了培训，可以操纵这一合作图，并指导代理人以隐式的方式实现合作。 CG-MARL的层次结构特征为自定义集群活动提供了空间，这是一种可扩展的界面，用于引入基本的合作知识。在实验中，CG-MARL在稀疏奖励多基准基准中显示出最新的性能，包括抗侵袭拦截任务和多货车交付任务。

Multiagent reinforcement learning (MARL) can solve complex cooperative tasks. However, the efficiency of existing MARL methods relies heavily on well-defined reward functions. Multiagent tasks with sparse reward feedback are especially challenging not only because of the credit distribution problem, but also due to the low probability of obtaining positive reward feedback. In this paper, we design a graph network called Cooperation Graph (CG). The Cooperation Graph is the combination of two simple bipartite graphs, namely, the Agent Clustering subgraph (ACG) and the Cluster Designating subgraph (CDG). Next, based on this novel graph structure, we propose a Cooperation Graph Multiagent Reinforcement Learning (CG-MARL) algorithm, which can efficiently deal with the sparse reward problem in multiagent tasks. In CG-MARL, agents are directly controlled by the Cooperation Graph. And a policy neural network is trained to manipulate this Cooperation Graph, guiding agents to achieve cooperation in an implicit way. This hierarchical feature of CG-MARL provides space for customized cluster-actions, an extensible interface for introducing fundamental cooperation knowledge. In experiments, CG-MARL shows state-of-the-art performance in sparse reward multiagent benchmarks, including the anti-invasion interception task and the multi-cargo delivery task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题