基于变压器的价值函数分解，用于合作多代理增强学习的星际争霸

论文标题

基于变压器的价值函数分解，用于合作多代理增强学习的星际争霸

Transformer-based Value Function Decomposition for Cooperative Multi-agent Reinforcement Learning in StarCraft

论文作者

Khan, Muhammad Junaid, Ahmed, Syed Hammad, Sukthankar, Gita

论文摘要

Starcraft II多代理挑战（SMAC）被创建为合作多代理增强学习（MARL）的具有挑战性的基准问题。 SMAC专注于星际争霸微管理的问题，并假设每个单元都由独立行动并仅具有本地信息的学习代理人单独控制；假定通过分散执行（CTDE）进行集中培训。为了在SMAC中表现良好，MALL算法必须处理多机构信用分配和联合行动评估的双重问题。本文介绍了一种新的体系结构Transmix，这是一个基于变压器的联合行动值混合网络，与其他最先进的合作MARL解决方案相比，我们显示出高效且可扩展的。 Transmix利用了变形金刚学习更丰富的混合功能的能力，以结合代理的个人价值函数。它与以前的SMAC场景上的工作相当，并且在硬方案上的其他技术以及用高斯噪声损坏以模拟战争雾的方案。

The StarCraft II Multi-Agent Challenge (SMAC) was created to be a challenging benchmark problem for cooperative multi-agent reinforcement learning (MARL). SMAC focuses exclusively on the problem of StarCraft micromanagement and assumes that each unit is controlled individually by a learning agent that acts independently and only possesses local information; centralized training is assumed to occur with decentralized execution (CTDE). To perform well in SMAC, MARL algorithms must handle the dual problems of multi-agent credit assignment and joint action evaluation. This paper introduces a new architecture TransMix, a transformer-based joint action-value mixing network which we show to be efficient and scalable as compared to the other state-of-the-art cooperative MARL solutions. TransMix leverages the ability of transformers to learn a richer mixing function for combining the agents' individual value functions. It achieves comparable performance to previous work on easy SMAC scenarios and outperforms other techniques on hard scenarios, as well as scenarios that are corrupted with Gaussian noise to simulate fog of war.

下载PDF全文

下载文献需遵守相关版权规定

论文标题