论文标题
在度量空间中可证明适应性增强学习
Provably adaptive reinforcement learning in metric spaces
论文作者
论文摘要
我们研究持续状态和行动空间的强化学习。我们提供了对Sinclair,Banerjee和Yu(2019)算法变体的精致分析,并表明其遗憾以实例的\ emph {Zooming Dimension}进行了缩放。该参数起源于强盗文献,捕获了几乎最佳动作的子集的大小,并且总是比以前分析中使用的覆盖尺寸小。因此,我们的结果是在度量空间中获得强化学习的首次自适应保证。
We study reinforcement learning in continuous state and action spaces endowed with a metric. We provide a refined analysis of a variant of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the \emph{zooming dimension} of the instance. This parameter, which originates in the bandit literature, captures the size of the subsets of near optimal actions and is always smaller than the covering dimension used in previous analyses. As such, our results are the first provably adaptive guarantees for reinforcement learning in metric spaces.