有效强化学习的行为先验

论文标题

有效强化学习的行为先验

Behavior Priors for Efficient Reinforcement Learning

论文作者

Tirumala, Dhruva, Galashov, Alexandre, Noh, Hyeonwoo, Hasenclever, Leonard, Pascanu, Razvan, Schwarz, Jonathan, Desjardins, Guillaume, Czarnecki, Wojciech Marian, Ahuja, Arun, Teh, Yee Whye, Heess, Nicolas

论文摘要

当我们部署强化学习者来解决越来越具有挑战性的问题时，使我们能够注入有关世界结构和有效解决方案策略的先验知识的方法变得越来越重要。在这项工作中，我们考虑如何将信息和架构约束与概率建模文献中的思想结合在一起，以学习捕获一组相关任务或上下文中共享的共同运动和交互模式的行为先验。例如，人类的日常行为包括在许多不同情况和目标中复发的独特运动和操纵模式。我们讨论如何使用概率轨迹模型捕获这种行为模式，以及如何将它们有效地集成到加强学习方案中，例如\ \，以促进多任务和转移学习。然后，我们将这些想法扩展到潜在变量模型，并考虑一种公式来学习层次结构先验，以捕获可重复使用的模块中行为的不同方面。我们讨论了这种潜在的变量公式如何与相关的分层增强学习（HRL）以及基于好奇心的目标相关的工作，从而为现有思想提供了另一种观点。我们通过将其应用于一系列模拟连续控制域来证明我们的框架的有效性。

As we deploy reinforcement learning agents to solve increasingly challenging problems, methods that allow us to inject prior knowledge about the structure of the world and effective solution strategies becomes increasingly important. In this work we consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors that capture the common movement and interaction patterns that are shared across a set of related tasks or contexts. For example the day-to day behavior of humans comprises distinctive locomotion and manipulation patterns that recur across many different situations and goals. We discuss how such behavior patterns can be captured using probabilistic trajectory models and how these can be integrated effectively into reinforcement learning schemes, e.g.\ to facilitate multi-task and transfer learning. We then extend these ideas to latent variable models and consider a formulation to learn hierarchical priors that capture different aspects of the behavior in reusable modules. We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives, thereby offering an alternative perspective on existing ideas. We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题