学习技能以基于不准确的模型来修补计划

论文标题

学习技能以基于不准确的模型来修补计划

Learning Skills to Patch Plans Based on Inaccurate Models

论文作者

LaGrassa, Alex, Lee, Steven, Kroemer, Oliver

论文摘要

使用准确模型的规划人员可以有效地完成现实世界中的操纵任务，但通常是高度专业化的，需要大量的微调才能可靠。同时，学习对于适应很有用，但可能需要大量的数据收集。在本文中，我们提出了一种方法，该方法通过在观察到意外的过渡时切换到无模型策略，以近似但简单而快速的模型来提高次优计划者的效率。与以前的工作不同，我们的方法专门解决了何时由于过渡模型错误而导致的何时仅在需要时使用本地策略进行修补。首先，我们使用基于亚最佳模型的计划者执行任务，直到检测到模型故障为止。接下来，我们从专家演示中学习一项无本地模型的政策，以完成模型失败的区域的任务。为了显示我们方法的疗效，我们使用形状插入拼图进行实验，并将我们的结果与纯计划和模仿学习方法进行比较。然后，我们将方法应用于开门任务。我们的实验表明，与纯计划相比，我们的贴片增强计划者的性能要比纯模仿学习更可靠。

Planners using accurate models can be effective for accomplishing manipulation tasks in the real world, but are typically highly specialized and require significant fine-tuning to be reliable. Meanwhile, learning is useful for adaptation, but can require a substantial amount of data collection. In this paper, we propose a method that improves the efficiency of sub-optimal planners with approximate but simple and fast models by switching to a model-free policy when unexpected transitions are observed. Unlike previous work, our method specifically addresses when the planner fails due to transition model error by patching with a local policy only where needed. First, we use a sub-optimal model-based planner to perform a task until model failure is detected. Next, we learn a local model-free policy from expert demonstrations to complete the task in regions where the model failed. To show the efficacy of our method, we perform experiments with a shape insertion puzzle and compare our results to both pure planning and imitation learning approaches. We then apply our method to a door opening task. Our experiments demonstrate that our patch-enhanced planner performs more reliably than pure planning and with lower overall sample complexity than pure imitation learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题