论文标题

MIME:相互信息最小化探索

MIME: Mutual Information Minimisation Exploration

论文作者

Xu, Haitao, McCane, Brendan, Szymanski, Lech, Atkinson, Craig

论文摘要

我们表明,强化的学习者惊讶地学习(惊奇)被困在突然的环境过渡边界上,因为这些过渡很难学习。我们提出了一种违反直觉的解决方案,我们称之为相互信息最小化探索(MIME),其中代理商在不尝试预测未来状态的情况下学习了对环境的潜在代表。我们表明,我们的代理在尖锐的过渡边界上的表现明显更好,同时匹配了其他地方的惊奇驱动器的性能。特别是,我们在艰难的学习游戏中表现出最先进的表现,例如Gragitar,Montezuma的复仇和厄运。

We show that reinforcement learning agents that learn by surprise (surprisal) get stuck at abrupt environmental transition boundaries because these transitions are difficult to learn. We propose a counter-intuitive solution that we call Mutual Information Minimising Exploration (MIME) where an agent learns a latent representation of the environment without trying to predict the future states. We show that our agent performs significantly better over sharp transition boundaries while matching the performance of surprisal driven agents elsewhere. In particular, we show state-of-the-art performance on difficult learning games such as Gravitar, Montezuma's Revenge and Doom.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源