论文标题

时间延伸的ε-怪兽探索

Temporally-Extended ε-Greedy Exploration

论文作者

Dabney, Will, Ostrovski, Georg, Barreto, André

论文摘要

关于加强学习(RL)探索的最新工作已导致了一系列日益复杂的解决方案。这种复杂性的增长通常是以一般性为代价的。最近的实证研究表明,当应用于更广泛的领域时,某些复杂的探索方法的表现要优于诸如ε-greedy等简单的探索方法。在本文中,我们提出了一种探索算法,该算法在减少抖动的同时保留了ε-greedy的简单性。我们以一个简单的假设为基础:ε-greedy探索的主要局限性在于它缺乏时间持久性,这限制了其逃脱本地最佳Opta的能力。我们提出了ε-greedy的时间扩展形式,该形式只需在随机持续时间内重复采样动作即可。事实证明,在许多持续时间分布中,这足以改善对大型域的探索。有趣的是,受动物觅食行为生态模型启发的一类分布产生了特别强大的表现。

Recent work on exploration in reinforcement learning (RL) has led to a series of increasingly complex solutions to the problem. This increase in complexity often comes at the expense of generality. Recent empirical studies suggest that, when applied to a broader set of domains, some sophisticated exploration methods are outperformed by simpler counterparts, such as ε-greedy. In this paper we propose an exploration algorithm that retains the simplicity of ε-greedy while reducing dithering. We build on a simple hypothesis: the main limitation of ε-greedy exploration is its lack of temporal persistence, which limits its ability to escape local optima. We propose a temporally extended form of ε-greedy that simply repeats the sampled action for a random duration. It turns out that, for many duration distributions, this suffices to improve exploration on a large set of domains. Interestingly, a class of distributions inspired by ecological models of animal foraging behaviour yields particularly strong performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源