从单个演示中利用加强学习的顺序性

论文标题

从单个演示中利用加强学习的顺序性

Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration

论文作者

Chenu, Alexandre, Serris, Olivier, Sigaud, Olivier, Perrin-Gilbert, Nicolas

论文摘要

深度强化学习已成功地用于学习机器人控制。但是，相应的算法在适用于仅在完成复杂任务后得到奖励的问题时遇到的困难。在这种情况下，使用示范可以大大加快学习过程，但是演示的获取可能是昂贵的。在本文中，我们建议利用顺序偏见来学习使用单个演示的复杂机器人任务的控制策略。为此，我们的方法学习了一个目标条件政策，以控制连续的低维目标之间的系统。这种顺序进行的目标方法提出了连续目标之间兼容的问题：我们需要确保实现目标所产生的状态与实现以下目标的实现兼容。为了解决这个问题，我们提出了一种称为DCIL-II的新算法。我们表明，DCIL-II可以使用前所未有的样本效率来解决一些具有挑战性的模拟任务，例如类人动力和站立，以及使用模拟的Cassie机器人快速运行。我们利用顺序性的方法是在最小规范工作下解决复杂机器人任务的一步，这是下一代自主机器人的关键特征。

Deep Reinforcement Learning has been successfully applied to learn robotic control. However, the corresponding algorithms struggle when applied to problems where the agent is only rewarded after achieving a complex task. In this context, using demonstrations can significantly speed up the learning process, but demonstrations can be costly to acquire. In this paper, we propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration. To do so, our method learns a goal-conditioned policy to control a system between successive low-dimensional goals. This sequential goal-reaching approach raises a problem of compatibility between successive goals: we need to ensure that the state resulting from reaching a goal is compatible with the achievement of the following goals. To tackle this problem, we present a new algorithm called DCIL-II. We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up as well as fast running with a simulated Cassie robot. Our method leveraging sequentiality is a step towards the resolution of complex robotic tasks under minimal specification effort, a key feature for the next generation of autonomous robots.

下载PDF全文

下载文献需遵守相关版权规定

论文标题