在过渡动态下不匹配的强大的逆增强学习不匹配

论文标题

在过渡动态下不匹配的强大的逆增强学习不匹配

Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch

论文作者

Viano, Luca, Huang, Yu-Ting, Kamalaruban, Parameswaran, Weller, Adrian, Cevher, Volkan

论文摘要

我们研究了专家与学习者之间的过渡动力不匹配下的逆增强学习（IRL）问题。具体来说，我们考虑了最大因果熵（MCE）IRL学习者模型，并根据学习者的绩效降级，基于$ \ ell_1 $ - 差异，在专家和学习者的过渡动力学之间提供了差异。利用强大的RL文献中的见解，我们提出了一种强大的MCE IRL算法，这是一种帮助解决这一不匹配的原则方法。最后，与在有限和连续MDP问题中的过渡动力学不匹配的标准MCE IRL算法相比，我们从经验上证明了算法的稳定性能。

We study the inverse reinforcement learning (IRL) problem under a transition dynamics mismatch between the expert and the learner. Specifically, we consider the Maximum Causal Entropy (MCE) IRL learner model and provide a tight upper bound on the learner's performance degradation based on the $\ell_1$-distance between the transition dynamics of the expert and the learner. Leveraging insights from the Robust RL literature, we propose a robust MCE IRL algorithm, which is a principled approach to help with this mismatch. Finally, we empirically demonstrate the stable performance of our algorithm compared to the standard MCE IRL algorithm under transition dynamics mismatches in both finite and continuous MDP problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题