多代理深入学习的视觉通信图

论文标题

多代理深入学习的视觉通信图

A Visual Communication Map for Multi-Agent Deep Reinforcement Learning

论文作者

Nguyen, Ngoc Duy, Nguyen, Thanh Thi, Creighton, Doug, Nahavandi, Saeid

论文摘要

深厚的加强学习已成功地用于解决各种现实世界中的问题，并且在多代理设置中的应用数量一直在增加。多学院学习在分配隐藏的交流媒介的努力方面明显构成了重大挑战。代理从媒介中获得透彻的知识，以确定分布式性质中的后续行动。显然，目标是利用多个代理的合作来有效地实现指定的客观。最近的研究通常将专门的神经网络与增强学习结合在一起，以使代理之间的交流。但是，这种方法限制了代理的数量或需要系统的同质性。在本文中，我们提出了一种更可扩展的方法，该方法不仅涉及大量代理，而且还可以在不同的功能代理之间进行协作，并与任何深入的强化学习方法兼容。具体来说，我们创建一个全局通信图，以视觉上表示系统中每个代理的状态。视觉地图和环境状态被馈送到共享参数网络，以同时训练多个代理。最后，我们选择异步优势参与者 - 批评（A3C）算法来演示我们提出的方案，即多代理A3C（VMA3C）的视觉通信图。仿真结果表明，视觉通信图的使用提高了在多代理问题中学习速度，奖励成就和鲁棒性方面的A3C性能。

Deep reinforcement learning has been applied successfully to solve various real-world problems and the number of its applications in the multi-agent settings has been increasing. Multi-agent learning distinctly poses significant challenges in the effort to allocate a concealed communication medium. Agents receive thorough knowledge from the medium to determine subsequent actions in a distributed nature. Apparently, the goal is to leverage the cooperation of multiple agents to achieve a designated objective efficiently. Recent studies typically combine a specialized neural network with reinforcement learning to enable communication between agents. This approach, however, limits the number of agents or necessitates the homogeneity of the system. In this paper, we have proposed a more scalable approach that not only deals with a great number of agents but also enables collaboration between dissimilar functional agents and compatibly combined with any deep reinforcement learning methods. Specifically, we create a global communication map to represent the status of each agent in the system visually. The visual map and the environmental state are fed to a shared-parameter network to train multiple agents concurrently. Finally, we select the Asynchronous Advantage Actor-Critic (A3C) algorithm to demonstrate our proposed scheme, namely Visual communication map for Multi-agent A3C (VMA3C). Simulation results show that the use of visual communication map improves the performance of A3C regarding learning speed, reward achievement, and robustness in multi-agent problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题