使用Amigo学习：具有对抗动机的内在目标

论文标题

使用Amigo学习：具有对抗动机的内在目标

Learning with AMIGo: Adversarially Motivated Intrinsic Goals

论文作者

Campero, Andres, Raileanu, Roberta, Küttler, Heinrich, Tenenbaum, Joshua B., Rocktäschel, Tim, Grefenstette, Edward

论文摘要

加强学习（RL）的主要挑战是在稀疏外部奖励的环境中学习。与当前的RL方法相反，人类能够通过使用各种形式的内在动机来学习新技能，几乎没有回报。我们提出了一个新颖的代理商Amigo，以元学习的形式纳入目标，这是一位目标的老师，它提议在没有（或与之伴随）环境奖励的情况下培训目标条件条件的“学生”政策，以培训目标有动机的内在目标。具体而言，通过一个简单但有效的“建设性的对抗性”目标，教师学会了提出越来越具有挑战性的目标（但可以实现）的目标，这些目标使学生能够在新环境中学习一般技能，而与任务无关。我们表明，我们的方法生成了自然的自我实现目标课程，最终使代理可以解决具有挑战性的程序生成的任务，而其他形式的内在动机和最先进的RL方法失败了。

A key challenge for reinforcement learning (RL) consists of learning in environments with sparse extrinsic rewards. In contrast to current RL methods, humans are able to learn new skills with little or no reward by using various forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating -- as form of meta-learning -- a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals to train a goal-conditioned "student" policy in the absence of (or alongside) environment reward. Specifically, through a simple but effective "constructively adversarial" objective, the teacher learns to propose increasingly challenging -- yet achievable -- goals that allow the student to learn general skills for acting in a new environment, independent of the task to be solved. We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks where other forms of intrinsic motivation and state-of-the-art RL methods fail.

下载PDF全文

下载文献需遵守相关版权规定

论文标题