论文标题
重新聚集的变分发散最小化以稳定模仿
Reparameterized Variational Divergence Minimization for Stable Imitation
论文作者
论文摘要
尽管对抗性模仿学习算法的最新最新结果令人鼓舞,但最近的著作探索了从观察(ILO)设置学习的模仿学习,在该设置中,轨迹\ textIt {仅}包含专家观察,但尚未获得相同的成功。受到标准模仿学习设置的$ f $ divergence操纵的最新调查的启发(Ke等,2019; Ghasemipour等,2019),我们在这里研究概率差异选择的变化可能会产生更多性能的ILO算法。不幸的是,我们发现$ f $ divergence通过加强学习最小化容易受到数值不稳定性的影响。我们为对抗性模仿学习做出了重新聚集的技巧,以减轻有希望的$ f $ ddivergence最小化框架的优化挑战。从经验上讲,我们证明了我们的设计选择允许ILO算法优于基线方法,并且在低维连续控制任务中更匹配专家性能。
While recent state-of-the-art results for adversarial imitation-learning algorithms are encouraging, recent works exploring the imitation learning from observation (ILO) setting, where trajectories \textit{only} contain expert observations, have not been met with the same success. Inspired by recent investigations of $f$-divergence manipulation for the standard imitation learning setting(Ke et al., 2019; Ghasemipour et al., 2019), we here examine the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We unfortunately find that $f$-divergence minimization through reinforcement learning is susceptible to numerical instabilities. We contribute a reparameterization trick for adversarial imitation learning to alleviate the optimization challenges of the promising $f$-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.