退化的地平线逆增强学习

论文标题

退化的地平线逆增强学习

Receding Horizon Inverse Reinforcement Learning

论文作者

Xu, Yiqing, Gao, Wei, Hsu, David

论文摘要

逆强化学习（IRL）试图推断出一种成本函数，以解释专家演示的基本目标和偏好。本文介绍了向后的地平线逆增强学习（RHIRL），这是一种新的IRL算法，用于具有黑盒动态模型的高维，嘈杂，连续的系统。 Rhirl解决了IRL的两个主要挑战：可伸缩性和鲁棒性。为了处理高维连续系统，Rhirl以衰减的方式与当地的专家演示匹配了诱发的最佳轨迹，并将局部解决方案“缝合”以学习成本；因此，它避免了“维度的诅咒”。这与早期的算法形成鲜明对比，这些算法与在整个高维状态空间中与全球范围内的专家示威相匹配。为了与不完美的专家示范和控制噪声保持稳健，Rhirl在轻度条件下学习了与系统动力学的状态依赖性成本函数。基准任务上的实验表明，在大多数情况下，Rhirl的表现都优于几种领先的IRL算法。我们还证明，Rhirl的累积误差随任务持续时间线性增长。

Inverse reinforcement learning (IRL) seeks to infer a cost function that explains the underlying goals and preferences of expert demonstrations. This paper presents receding horizon inverse reinforcement learning (RHIRL), a new IRL algorithm for high-dimensional, noisy, continuous systems with black-box dynamic models. RHIRL addresses two key challenges of IRL: scalability and robustness. To handle high-dimensional continuous systems, RHIRL matches the induced optimal trajectories with expert demonstrations locally in a receding horizon manner and 'stitches' together the local solutions to learn the cost; it thereby avoids the 'curse of dimensionality'. This contrasts sharply with earlier algorithms that match with expert demonstrations globally over the entire high-dimensional state space. To be robust against imperfect expert demonstrations and control noise, RHIRL learns a state-dependent cost function 'disentangled' from system dynamics under mild conditions. Experiments on benchmark tasks show that RHIRL outperforms several leading IRL algorithms in most instances. We also prove that the cumulative error of RHIRL grows linearly with the task duration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题