论文标题
普遍的增强学习的事后看来
Generalized Hindsight for Reinforcement Learning
论文作者
论文摘要
钢筋学习(RL)高样本复杂性(RL)的关键原因之一是无法将知识从一个任务转移到另一个任务。在标准的多任务RL设置中,试图解决一项任务时收集的低奖励数据几乎没有信号解决该特定任务,因此有效地浪费了。但是,我们认为,对于一个任务而言,这是不信息的,可能是其他任务的丰富信息来源。为了利用这种洞察力,有效地重复使用数据,我们提出了广义的事后观察:一种大概的逆强化学习技术,用于与正确的任务进行重新标记的行为。直观地,鉴于在一项任务下产生的行为,广义的后视返回了该行为更适合行为的不同任务。然后,该行为已通过此新任务进行重新标记,然后再由非政策RL优化器使用。与标准的重新标记技术相比,广义的事后见解提供了更有效的样品重复使用,我们从经验上证明了这在一套多任务导航和操纵任务上。可以在此处访问视频和代码:https://sites.google.com/view/generalized-hindsight。
One of the key reasons for the high sample complexity in reinforcement learning (RL) is the inability to transfer knowledge from one task to another. In standard multi-task RL settings, low-reward data collected while trying to solve one task provides little to no signal for solving that particular task and is hence effectively wasted. However, we argue that this data, which is uninformative for one task, is likely a rich source of information for other tasks. To leverage this insight and efficiently reuse data, we present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks. Intuitively, given a behavior generated under one task, Generalized Hindsight returns a different task that the behavior is better suited for. Then, the behavior is relabeled with this new task before being used by an off-policy RL optimizer. Compared to standard relabeling techniques, Generalized Hindsight provides a substantially more efficient reuse of samples, which we empirically demonstrate on a suite of multi-task navigation and manipulation tasks. Videos and code can be accessed here: https://sites.google.com/view/generalized-hindsight.