学习积极的任务探索策略，用于弥合SIM到现实差距

论文标题

学习积极的任务探索策略，用于弥合SIM到现实差距

Learning Active Task-Oriented Exploration Policies for Bridging the Sim-to-Real Gap

论文作者

Liang, Jacky, Saxena, Saumya, Kroemer, Oliver

论文摘要

在模拟中训练机器人策略会遭受SIM到真实差距的损失，因为模拟动力学可能与现实世界动态不同。过去的作品通过域随机化和在线系统识别解决了这一问题。前者对动态参数的手动指定训练分布很敏感，并且可能导致过度保守的行为。后者需要同时执行任务并为系统识别生成有用的轨迹的学习政策。在这项工作中，我们提出并分析了一个学习探索政策的框架，该探索政策明确执行了面向任务的勘探操作，以识别与任务相关的系统参数。然后，这些参数由基于模型的轨迹优化算法使用来执行现实世界中的任务。我们将使用线性二次调节器以及在现实世界中实例化框架，并在现实世界中使用倾倒和对象拖动任务。实验表明，以任务为导向的探索有助于基于模型的策略适应具有最初未知参数的系统，并且比任务无关探索可以提高任务性能。

Training robotic policies in simulation suffers from the sim-to-real gap, as simulated dynamics can be different from real-world dynamics. Past works tackled this problem through domain randomization and online system-identification. The former is sensitive to the manually-specified training distribution of dynamics parameters and can result in behaviors that are overly conservative. The latter requires learning policies that concurrently perform the task and generate useful trajectories for system identification. In this work, we propose and analyze a framework for learning exploration policies that explicitly perform task-oriented exploration actions to identify task-relevant system parameters. These parameters are then used by model-based trajectory optimization algorithms to perform the task in the real world. We instantiate the framework in simulation with the Linear Quadratic Regulator as well as in the real world with pouring and object dragging tasks. Experiments show that task-oriented exploration helps model-based policies adapt to systems with initially unknown parameters, and it leads to better task performance than task-agnostic exploration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题