用BC增强Gail，以进行样本有效的模仿学习

论文标题

用BC增强Gail，以进行样本有效的模仿学习

Augmenting GAIL with BC for sample efficient imitation learning

论文作者

Jena, Rohit, Liu, Changliu, Sycara, Katia

论文摘要

模仿学习是恢复专家政策而无需访问奖励信号的问题。行为克隆和盖尔是两种用于执行模仿学习的广泛使用的方法。行为克隆在一些迭代中收敛，但由于其固有的IID假设，因此无法实现峰值性能。 Gail在执行代理商与专家之间的状态分布匹配时考虑时间依赖性来解决问题。尽管盖尔（Gail）在所需的专家轨迹数量上是有效的样本，但就策略收敛所需的环境相互作用而言，它仍然不是很有效的。鉴于这两种方法的互补益处，我们提出了一种简单而优雅的方法，可以结合两种方法，以实现稳定和样品的有效学习。我们的算法非常易于实现并与不同的策略梯度算法集成。我们证明了算法在低维控制任务，网格世界和高维基于图像的任务中的有效性。

Imitation learning is the problem of recovering an expert policy without access to a reward signal. Behavior cloning and GAIL are two widely used methods for performing imitation learning. Behavior cloning converges in a few iterations but doesn't achieve peak performance due to its inherent iid assumption about the state-action distribution. GAIL addresses the issue by accounting for the temporal dependencies when performing a state distribution matching between the agent and the expert. Although GAIL is sample efficient in the number of expert trajectories required, it is still not very sample efficient in terms of the environment interactions needed for convergence of the policy. Given the complementary benefits of both methods, we present a simple and elegant method to combine both methods to enable stable and sample efficient learning. Our algorithm is very simple to implement and integrates with different policy gradient algorithms. We demonstrate the effectiveness of the algorithm in low dimensional control tasks, gridworlds and in high dimensional image-based tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题