论文标题
分层增强学习的抽象价值迭代
Abstract Value Iteration for Hierarchical Reinforcement Learning
论文作者
论文摘要
我们提出了一个新型的分层增强学习框架,以通过连续状态和动作空间进行控制。在我们的框架中,用户指定了作为状态子集的亚目标区域;然后,我们(i)学习作为这些子目标之间过渡的选项,以及(ii)在由此产生的抽象决策过程(ADP)中构建高级计划。一个关键的挑战是ADP可能不是马尔可夫,我们通过提出两种在ADP中进行计划的算法来解决。我们的第一种算法是保守的,使我们能够证明其性能的理论保证,这有助于为亚目标地区的设计提供信息。我们的第二种算法是一种实用的算法,可以在抽象层面上交织计划并在具体层面学习。在我们的实验中,我们证明了我们的方法优于几种具有挑战性的基准的最先进的层次强化学习算法。
We propose a novel hierarchical reinforcement learning framework for control with continuous state and action spaces. In our framework, the user specifies subgoal regions which are subsets of states; then, we (i) learn options that serve as transitions between these subgoal regions, and (ii) construct a high-level plan in the resulting abstract decision process (ADP). A key challenge is that the ADP may not be Markov, which we address by proposing two algorithms for planning in the ADP. Our first algorithm is conservative, allowing us to prove theoretical guarantees on its performance, which help inform the design of subgoal regions. Our second algorithm is a practical one that interweaves planning at the abstract level and learning at the concrete level. In our experiments, we demonstrate that our approach outperforms state-of-the-art hierarchical reinforcement learning algorithms on several challenging benchmarks.