论文标题
基于好奇心驱动的核规范最大化学习
Nuclear Norm Maximization Based Curiosity-Driven Learning
论文作者
论文摘要
为了处理增强学习中外部奖励的稀疏性,研究人员提出了内在的奖励,使代理商能够学习未来追求奖励的技能,例如鼓励代理商参观新颖的国家。但是,由于环境不良的随机性,并且直接应用嘈杂的价值预测来监督该政策是有害的,因此可以提高学习绩效和效率,因此内在的奖励可能是嘈杂的。此外,许多先前的研究采用$ \ ell^2 $ norm或差异来衡量勘探新颖性,这将扩大由于方形操作而引起的噪声。在本文中,我们提出了一种新的好奇心来解决上述挑战,以利用核规范最大化(NNM),该好奇心可以量化更准确地探索环境的新颖性,同时为噪声和异常值提供高耐受性。我们对各种基准环境进行了广泛的实验,结果表明,与以前的好奇方法相比,NNM可以提供最先进的性能。在26个Atari Games子集中,NNM仅接受固有奖励的培训时,将获得1.09的人体标准分数,这使得基于竞争性奖励的方法翻了一番。我们的代码将公开发布以提高可重复性。
To handle the sparsity of the extrinsic rewards in reinforcement learning, researchers have proposed intrinsic reward which enables the agent to learn the skills that might come in handy for pursuing the rewards in the future, such as encouraging the agent to visit novel states. However, the intrinsic reward can be noisy due to the undesirable environment's stochasticity and directly applying the noisy value predictions to supervise the policy is detrimental to improve the learning performance and efficiency. Moreover, many previous studies employ $\ell^2$ norm or variance to measure the exploration novelty, which will amplify the noise due to the square operation. In this paper, we address aforementioned challenges by proposing a novel curiosity leveraging the nuclear norm maximization (NNM), which can quantify the novelty of exploring the environment more accurately while providing high-tolerance to the noise and outliers. We conduct extensive experiments across a variety of benchmark environments and the results suggest that NNM can provide state-of-the-art performance compared with previous curiosity methods. On 26 Atari games subset, when trained with only intrinsic reward, NNM achieves a human-normalized score of 1.09, which doubles that of competitive intrinsic rewards-based approaches. Our code will be released publicly to enhance the reproducibility.