强化学习和计划的统一框架

论文标题

强化学习和计划的统一框架

A Unifying Framework for Reinforcement Learning and Planning

论文作者

Moerland, Thomas M., Broekens, Joost, Plaat, Aske, Jonker, Catholijn M.

论文摘要

顺序决策通常被正式形式化为对马尔可夫决策过程的优化，是人工智能的关键挑战。 MDP优化的两种成功方法是增强学习和计划，这在很大程度上都有自己的研究社区。但是，如果两个研究领域都解决了相同的问题，那么我们也许能够解散其解决方案方法中的共同因素。因此，本文提出了一个统一的算法算法框架（FRAP），该框架确定了MDP计划和学习算法必须决定的基本维度。在本文的最后，我们比较了沿这些维度的各种知名计划，无模型和基于模型的RL算法。总体而言，该框架可能有助于在计划和强化学习的算法设计空间中提供更深入的见解。

Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are reinforcement learning and planning, which both largely have their own research communities. However, if both research fields solve the same problem, then we might be able to disentangle the common factors in their solution approaches. Therefore, this paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP), which identifies underlying dimensions on which MDP planning and learning algorithms have to decide. At the end of the paper, we compare a variety of well-known planning, model-free and model-based RL algorithms along these dimensions. Altogether, the framework may help provide deeper insight in the algorithmic design space of planning and reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题