时间延伸的ε-怪兽探索

论文标题

时间延伸的ε-怪兽探索

Temporally-Extended ε-Greedy Exploration

论文作者

Dabney, Will, Ostrovski, Georg, Barreto, André

论文摘要

关于加强学习（RL）探索的最新工作已导致了一系列日益复杂的解决方案。这种复杂性的增长通常是以一般性为代价的。最近的实证研究表明，当应用于更广泛的领域时，某些复杂的探索方法的表现要优于诸如ε-greedy等简单的探索方法。在本文中，我们提出了一种探索算法，该算法在减少抖动的同时保留了ε-greedy的简单性。我们以一个简单的假设为基础：ε-greedy探索的主要局限性在于它缺乏时间持久性，这限制了其逃脱本地最佳Opta的能力。我们提出了ε-greedy的时间扩展形式，该形式只需在随机持续时间内重复采样动作即可。事实证明，在许多持续时间分布中，这足以改善对大型域的探索。有趣的是，受动物觅食行为生态模型启发的一类分布产生了特别强大的表现。

Recent work on exploration in reinforcement learning (RL) has led to a series of increasingly complex solutions to the problem. This increase in complexity often comes at the expense of generality. Recent empirical studies suggest that, when applied to a broader set of domains, some sophisticated exploration methods are outperformed by simpler counterparts, such as ε-greedy. In this paper we propose an exploration algorithm that retains the simplicity of ε-greedy while reducing dithering. We build on a simple hypothesis: the main limitation of ε-greedy exploration is its lack of temporal persistence, which limits its ability to escape local optima. We propose a temporally extended form of ε-greedy that simply repeats the sampled action for a random duration. It turns out that, for many duration distributions, this suffices to improve exploration on a large set of domains. Interestingly, a class of distributions inspired by ecological models of animal foraging behaviour yields particularly strong performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题