多代理价值分解的对比性身份感知学习

论文标题

多代理价值分解的对比性身份感知学习

Contrastive Identity-Aware Learning for Multi-Agent Value Decomposition

论文作者

Liu, Shunyu, Zhou, Yihe, Song, Jie, Zheng, Tongya, Chen, Kaixuan, Zhu, Tongtian, Feng, Zunlei, Song, Mingli

论文摘要

价值分解（VD）的目的是在仅在全球奖励的情况下推断出代理商对分散政策的贡献，并且最近成为解决合作多代理增强学习（MARL）问题的有力信贷分配范式。 VD的主要挑战之一是促进代理之间的多种行为，而现有方法直接鼓励通过各种策略的学识渊博的代理网络的多样性。但是，我们认为这些针对代理网络的专用设计仍然受到无法区分的VD网络的限制，从而导致了均匀的代理行为，从而降低了合作能力。在本文中，我们提出了一种新颖的对比性身份感知学习（CIA）方法，明确提高了VD网络的信用级别的区分性，以打破多样性多样性的瓶颈。具体而言，我们的方法利用了对比学习，以最大程度地提高不同代理人的时间信用和身份表示之间的相互信息，从而鼓励信贷分配的全部表现力和进一步的个性出现。所提出的CIA模块的算法实现很简单，但可以很容易地将其纳入各种VD体系结构中。在SMAC基准和不同VD主干上进行的实验表明，所提出的方法的结果优于最先进的对应物。我们的代码可在https://github.com/liushunyu/cia上找到。

Value Decomposition (VD) aims to deduce the contributions of agents for decentralized policies in the presence of only global rewards, and has recently emerged as a powerful credit assignment paradigm for tackling cooperative Multi-Agent Reinforcement Learning (MARL) problems. One of the main challenges in VD is to promote diverse behaviors among agents, while existing methods directly encourage the diversity of learned agent networks with various strategies. However, we argue that these dedicated designs for agent networks are still limited by the indistinguishable VD network, leading to homogeneous agent behaviors and thus downgrading the cooperation capability. In this paper, we propose a novel Contrastive Identity-Aware learning (CIA) method, explicitly boosting the credit-level distinguishability of the VD network to break the bottleneck of multi-agent diversity. Specifically, our approach leverages contrastive learning to maximize the mutual information between the temporal credits and identity representations of different agents, encouraging the full expressiveness of credit assignment and further the emergence of individualities. The algorithm implementation of the proposed CIA module is simple yet effective that can be readily incorporated into various VD architectures. Experiments on the SMAC benchmarks and across different VD backbones demonstrate that the proposed method yields results superior to the state-of-the-art counterparts. Our code is available at https://github.com/liushunyu/CIA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题