多机构合作的信用认知强化学习

论文标题

多机构合作的信用认知强化学习

Credit-cognisant reinforcement learning for multi-agent cooperation

论文作者

Bredell, F., Engelbrecht, H. A., Schoeman, J. C.

论文摘要

传统的多代理增强学习（MARL）算法，例如独立的Q学习，在出现部分可观察的情景时挣扎，以及需要代理来开发微妙的动作序列的地方。这通常是仅在其他代理人采取其采取奖励后才获得良好行动的奖励的结果，而这些行动也没有得到相应的认可。事实证明，经常性的神经网络是解决这类问题的可行解决方案策略，与其他方法相比，性能大幅度提高。在本文中，我们探讨了一种不同的方法，并专注于用于更新每个代理的动作值功能的经验。我们介绍了信用 - 认知奖励（CCR）的概念，该奖励奖励使代理商能够感知其行动对环境以及对其共同代理的影响。我们表明，通过操纵这些经验并构建其中包含的奖励以包括所有代理在相同的动作序列中获得的奖励，我们能够在独立的深度Q学习以及深度循环Q学习的性能方面显着改善。当应用于流行的纸牌游戏Hanabi的简化版本，我们评估和测试CCR的性能。

Traditional multi-agent reinforcement learning (MARL) algorithms, such as independent Q-learning, struggle when presented with partially observable scenarios, and where agents are required to develop delicate action sequences. This is often the result of the reward for a good action only being available after other agents have taken theirs, and these actions are not credited accordingly. Recurrent neural networks have proven to be a viable solution strategy for solving these types of problems, resulting in significant performance increase when compared to other methods. In this paper, we explore a different approach and focus on the experiences used to update the action-value functions of each agent. We introduce the concept of credit-cognisant rewards (CCRs), which allows an agent to perceive the effect its actions had on the environment as well as on its co-agents. We show that by manipulating these experiences and constructing the reward contained within them to include the rewards received by all the agents within the same action sequence, we are able to improve significantly on the performance of independent deep Q-learning as well as deep recurrent Q-learning. We evaluate and test the performance of CCRs when applied to deep reinforcement learning techniques at the hands of a simplified version of the popular card game Hanabi.

下载PDF全文

下载文献需遵守相关版权规定

论文标题