论文标题
基于宏作行动的深度强化增强学习
Macro-Action-Based Deep Multi-Agent Reinforcement Learning
论文作者
论文摘要
在实际的多机器人系统中,执行高质量的协作行为需要机器人在不同时间持续时间内对高级动作选择的异步理由。宏行动分散的部分可观察到的马尔可夫决策过程(MACDEC-POMDP)为在完全合作的多代理任务中的不确定性下提供了一个总体框架。但是,仅针对(同步)原始操作问题开发了多代理深度强化学习方法。本文提出了两种基于Q-NETWORK(DQN)的方法,用于学习分散和集中的宏观成分功能,并使用针对每种情况引入的新型宏观动作轨迹重播缓冲液。对基准问题和较大领域的评估表明,宏观动作对原始作用的学习优势和方法的可扩展性。
In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.