分层决策变压器

论文标题

分层决策变压器

Hierarchical Decision Transformer

论文作者

Correia, André, Alexandre, Luís A.

论文摘要

加强学习中的序列模型需要任务知识来估计任务策略。本文提出了一种用于从演示中学习序列模型的层次算法。高级机制通过选择后者来达到的子目标来指导低级控制器。该序列取代了以前方法的返回，从而提高了其整体性能，尤其是在较长的情节和稀缺奖励的任务中。我们在OpenAigym，D4RL和Robomimic基准的多个任务中验证我们的方法。我们的方法的表现优于十分之八的不同视野任务和没有事先任务知识的奖励频率的基线，显示了使用序列模型从演示中学习的层次模型方法的优势。

Sequence models in reinforcement learning require task knowledge to estimate the task policy. This paper presents a hierarchical algorithm for learning a sequence model from demonstrations. The high-level mechanism guides the low-level controller through the task by selecting sub-goals for the latter to reach. This sequence replaces the returns-to-go of previous methods, improving its performance overall, especially in tasks with longer episodes and scarcer rewards. We validate our method in multiple tasks of OpenAIGym, D4RL and RoboMimic benchmarks. Our method outperforms the baselines in eight out of ten tasks of varied horizons and reward frequencies without prior task knowledge, showing the advantages of the hierarchical model approach for learning from demonstrations using a sequence model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题