论文标题
HMRL:稀疏奖励加强学习问题的超级摩托学学习问题
HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning Problem
论文作者
论文摘要
尽管现有的元强化学习方法取得了成功,但他们仍然很难有效地学习元素策略,以实现稀疏奖励问题的RL问题。在这方面,我们为稀疏的奖励RL问题开发了一种新型的元加强学习框架,称为Hyper-Meta RL(HMRL)。它由三个模块组成,包括交叉环境元状态嵌入模块,该模块构建了一个常见的元状态空间以适应不同的环境。基于元状态的环境特定的元奖励成型,通过跨环境知识互补性有效地扩展了原始的稀疏奖励轨迹,因此,元政策通过形状的元奖励实现了更好的概括和效率。稀疏回报环境的实验表明,HMRL对可转让性和政策学习效率的优越性。
In spite of the success of existing meta reinforcement learning methods, they still have difficulty in learning a meta policy effectively for RL problems with sparse reward. In this respect, we develop a novel meta reinforcement learning framework called Hyper-Meta RL(HMRL), for sparse reward RL problems. It is consisted with three modules including the cross-environment meta state embedding module which constructs a common meta state space to adapt to different environments; the meta state based environment-specific meta reward shaping which effectively extends the original sparse reward trajectory by cross-environmental knowledge complementarity and as a consequence the meta policy achieves better generalization and efficiency with the shaped meta reward. Experiments with sparse-reward environments show the superiority of HMRL on both transferability and policy learning efficiency.