重新聚集的变分发散最小化以稳定模仿

论文标题

重新聚集的变分发散最小化以稳定模仿

Reparameterized Variational Divergence Minimization for Stable Imitation

论文作者

Arumugam, Dilip, Dey, Debadeepta, Agarwal, Alekh, Celikyilmaz, Asli, Nouri, Elnaz, Dolan, Bill

论文摘要

尽管对抗性模仿学习算法的最新最新结果令人鼓舞，但最近的著作探索了从观察（ILO）设置学习的模仿学习，在该设置中，轨迹\ textIt {仅}包含专家观察，但尚未获得相同的成功。受到标准模仿学习设置的$ f $ divergence操纵的最新调查的启发（Ke等，2019； Ghasemipour等，2019），我们在这里研究概率差异选择的变化可能会产生更多性能的ILO算法。不幸的是，我们发现$ f $ divergence通过加强学习最小化容易受到数值不稳定性的影响。我们为对抗性模仿学习做出了重新聚集的技巧，以减轻有希望的$ f $ ddivergence最小化框架的优化挑战。从经验上讲，我们证明了我们的设计选择允许ILO算法优于基线方法，并且在低维连续控制任务中更匹配专家性能。

While recent state-of-the-art results for adversarial imitation-learning algorithms are encouraging, recent works exploring the imitation learning from observation (ILO) setting, where trajectories \textit{only} contain expert observations, have not been met with the same success. Inspired by recent investigations of $f$-divergence manipulation for the standard imitation learning setting(Ke et al., 2019; Ghasemipour et al., 2019), we here examine the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We unfortunately find that $f$-divergence minimization through reinforcement learning is susceptible to numerical instabilities. We contribute a reparameterization trick for adversarial imitation learning to alleviate the optimization challenges of the promising $f$-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题