关于自我监督政策适应的几何观点

论文标题

关于自我监督政策适应的几何观点

A Geometric Perspective on Self-Supervised Policy Adaptation

论文作者

Bodnar, Cristian, Hausman, Karol, Dulac-Arnold, Gabriel, Jonschkowski, Rico

论文摘要

现实世界增强学习（RL）最具挑战性的方面之一是许多不可预测且不断变化的干扰，可以将代理人从其训练环境中所负责的事情转移出来。虽然代理商可以从奖励信号中学习以忽略它们，但现实世界的复杂性可以使奖励难以获得，或者充其量最多稀疏。最近的一类自制方法表明，有望在挑战性分散注意力下进行无奖励的适应。但是，以前的工作集中在简短的一集适应设置上。在本文中，我们考虑了一种长期适应设置，它更类似于现实世界的细节，并提出了关于自我监督的适应性的几何观点。我们从经验上描述在此适应过程中嵌入空间中发生的过程，揭示其对性能的一些不良影响，并显示如何消除它们。此外，我们从理论上研究了如何通过操纵演员和批评家功能所描述的流形的几何形状来进一步推广到目标环境。

One of the most challenging aspects of real-world reinforcement learning (RL) is the multitude of unpredictable and ever-changing distractions that could divert an agent from what was tasked to do in its training environment. While an agent could learn from reward signals to ignore them, the complexity of the real-world can make rewards hard to acquire, or, at best, extremely sparse. A recent class of self-supervised methods have shown promise that reward-free adaptation under challenging distractions is possible. However, previous work focused on a short one-episode adaptation setting. In this paper, we consider a long-term adaptation setup that is more akin to the specifics of the real-world and propose a geometric perspective on self-supervised adaptation. We empirically describe the processes that take place in the embedding space during this adaptation process, reveal some of its undesirable effects on performance and show how they can be eliminated. Moreover, we theoretically study how actor-based and actor-free agents can further generalise to the target environment by manipulating the geometry of the manifolds described by the actor and critic functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题