论文标题
暂停的代理重播刷新
Paused Agent Replay Refresh
论文作者
论文摘要
自从目标网络发明以来,强化学习算法变得更加复杂。不幸的是,目标网络并没有跟上这一增加的复杂性,而是需要在计算上可行的近似解决方案。这些近似增加了Q值目标和重播抽样分布中的噪声。暂停的代理重播刷新(PARR)是目标网络的倒入替换,它支持更复杂的学习算法而无需近似。使用基本的Q-Network架构,并刷新新颖性值,目标值和重放采样分布,Parr仅在仅3090万个Atari框架之后,在蒙特祖玛的复仇中获得2500点。最后,在基于碳的学习中解释Parr提供了一个新的睡眠原因。
Reinforcement learning algorithms have become more complex since the invention of target networks. Unfortunately, target networks have not kept up with this increased complexity, instead requiring approximate solutions to be computationally feasible. These approximations increase noise in the Q-value targets and in the replay sampling distribution. Paused Agent Replay Refresh (PARR) is a drop-in replacement for target networks that supports more complex learning algorithms without this need for approximation. Using a basic Q-network architecture, and refreshing the novelty values, target values, and replay sampling distribution, PARR gets 2500 points in Montezuma's Revenge after only 30.9 million Atari frames. Finally, interpreting PARR in the context of carbon-based learning offers a new reason for sleep.