HMRL：稀疏奖励加强学习问题的超级摩托学学习问题

论文标题

HMRL：稀疏奖励加强学习问题的超级摩托学学习问题

HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning Problem

论文作者

Hua, Yun, Wang, Xiangfeng, Jin, Bo, Li, Wenhao, Yan, Junchi, He, Xiaofeng, Zha, Hongyuan

论文摘要

尽管现有的元强化学习方法取得了成功，但他们仍然很难有效地学习元素策略，以实现稀疏奖励问题的RL问题。在这方面，我们为稀疏的奖励RL问题开发了一种新型的元加强学习框架，称为Hyper-Meta RL（HMRL）。它由三个模块组成，包括交叉环境元状态嵌入模块，该模块构建了一个常见的元状态空间以适应不同的环境。基于元状态的环境特定的元奖励成型，通过跨环境知识互补性有效地扩展了原始的稀疏奖励轨迹，因此，元政策通过形状的元奖励实现了更好的概括和效率。稀疏回报环境的实验表明，HMRL对可转让性和政策学习效率的优越性。

In spite of the success of existing meta reinforcement learning methods, they still have difficulty in learning a meta policy effectively for RL problems with sparse reward. In this respect, we develop a novel meta reinforcement learning framework called Hyper-Meta RL(HMRL), for sparse reward RL problems. It is consisted with three modules including the cross-environment meta state embedding module which constructs a common meta state space to adapt to different environments; the meta state based environment-specific meta reward shaping which effectively extends the original sparse reward trajectory by cross-environmental knowledge complementarity and as a consequence the meta policy achieves better generalization and efficiency with the shaped meta reward. Experiments with sparse-reward environments show the superiority of HMRL on both transferability and policy learning efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题