论文标题
通过模仿人来学习玩耍
Learning to Play by Imitating Humans
论文作者
论文摘要
获取多种技能通常涉及每项任务或工程定制奖励功能收集大量专家演示。最近,已经证明,可以通过在人类的遥控游戏数据上进行自我监督控制来获得各种技能。游戏具有丰富的状态空间覆盖范围,并且在该数据上训练的政策可以在测试时间概念到在个人专家任务演示中培训的策略。在这项工作中,我们探讨了机器人是否可以学会播放以自主生成最终提高性能的播放数据的问题。通过对相对少量的人类游戏进行训练,我们自主产生大量克隆的游戏数据,可以用作额外的培训。我们证明,在此增强数据集上训练的一项通用目标条件政策基本上优于仅在模拟机器人桌面环境中使用18个困难用户指定的操纵任务的原始人类数据训练的人。可以在这里看到一个模仿人类游戏的视频示例:https://learning-to-play.github.io/videos/undirected_play1.mp4
Acquiring multiple skills has commonly involved collecting a large number of expert demonstrations per task or engineering custom reward functions. Recently it has been shown that it is possible to acquire a diverse set of skills by self-supervising control on top of human teleoperated play data. Play is rich in state space coverage and a policy trained on this data can generalize to specific tasks at test time outperforming policies trained on individual expert task demonstrations. In this work, we explore the question of whether robots can learn to play to autonomously generate play data that can ultimately enhance performance. By training a behavioral cloning policy on a relatively small quantity of human play, we autonomously generate a large quantity of cloned play data that can be used as additional training. We demonstrate that a general purpose goal-conditioned policy trained on this augmented dataset substantially outperforms one trained only with the original human data on 18 difficult user-specified manipulation tasks in a simulated robotic tabletop environment. A video example of a robot imitating human play can be seen here: https://learning-to-play.github.io/videos/undirected_play1.mp4