最大熵多任务逆RL

论文标题

最大熵多任务逆RL

Maximum Entropy Multi-Task Inverse RL

论文作者

Arora, Saurabh, Banerjee, Bikramjit, Doshi, Prashant

论文摘要

多任务IRL允许专家可以在解决相同问题的多种方式之间切换或进行多个任务的演示。学习者旨在学习指导这些解决问题方式的多种奖励功能。我们提出了一种用于多任务IRL的新方法，该方法通过将其与观察到的输入的基于Dirichlet过程的聚类相结合，从而概括了IRL的众所周知的最大熵方法。这会产生一个单一的非线性优化问题，称为Maxent多任务IRL，可以使用Lagrangian松弛和梯度下降方法来解决。我们评估了Maxent多任务IRL在对机器人任务的模拟中，即在处理线上对洋葱进行分类，专家利用多种检测和去除斑点的洋葱。该方法能够将基本奖励功能学习至高度的准确性，并且可以改善以前的多任务IRL方法。

Multi-task IRL allows for the possibility that the expert could be switching between multiple ways of solving the same problem, or interleaving demonstrations of multiple tasks. The learner aims to learn the multiple reward functions that guide these ways of solving the problem. We present a new method for multi-task IRL that generalizes the well-known maximum entropy approach to IRL by combining it with the Dirichlet process based clustering of the observed input. This yields a single nonlinear optimization problem, called MaxEnt Multi-task IRL, which can be solved using the Lagrangian relaxation and gradient descent methods. We evaluate MaxEnt Multi-task IRL in simulation on the robotic task of sorting onions on a processing line where the expert utilizes multiple ways of detecting and removing blemished onions. The method is able to learn the underlying reward functions to a high level of accuracy and it improves on the previous approaches to multi-task IRL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题