论文标题

重新聚集的变分发散最小化以稳定模仿

Reparameterized Variational Divergence Minimization for Stable Imitation

论文作者

Arumugam, Dilip, Dey, Debadeepta, Agarwal, Alekh, Celikyilmaz, Asli, Nouri, Elnaz, Dolan, Bill

论文摘要

尽管对抗性模仿学习算法的最新最新结果令人鼓舞,但最近的著作探索了从观察(ILO)设置学习的模仿学习,在该设置中,轨迹\ textIt {仅}包含专家观察,但尚未获得相同的成功。受到标准模仿学习设置的$ f $ divergence操纵的最新调查的启发(Ke等,2019; Ghasemipour等,2019),我们在这里研究概率差异选择的变化可能会产生更多性能的ILO算法。不幸的是,我们发现$ f $ divergence通过加强学习最小化容易受到数值不稳定性的影响。我们为对抗性模仿学习做出了重新聚集的技巧,以减轻有希望的$ f $ ddivergence最小化框架的优化挑战。从经验上讲,我们证明了我们的设计选择允许ILO算法优于基线方法,并且在低维连续控制任务中更匹配专家性能。

While recent state-of-the-art results for adversarial imitation-learning algorithms are encouraging, recent works exploring the imitation learning from observation (ILO) setting, where trajectories \textit{only} contain expert observations, have not been met with the same success. Inspired by recent investigations of $f$-divergence manipulation for the standard imitation learning setting(Ke et al., 2019; Ghasemipour et al., 2019), we here examine the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We unfortunately find that $f$-divergence minimization through reinforcement learning is susceptible to numerical instabilities. We contribute a reparameterization trick for adversarial imitation learning to alleviate the optimization challenges of the promising $f$-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源