自动数据增强以进行深入增强学习的概括

论文标题

自动数据增强以进行深入增强学习的概括

Automatic Data Augmentation for Generalization in Deep Reinforcement Learning

论文作者

Raileanu, Roberta, Goldstein, Max, Yarats, Denis, Kostrikov, Ilya, Fergus, Rob

论文摘要

即使在许多语义上相似的环境中接受培训，深度强化学习（RL）代理也常常无法概括地看不见的情况。最近已证明数据增强可以提高RL药物的样品效率和概括。但是，不同的任务往往会受益于不同类型的数据扩展。在本文中，我们比较了三种自动找到适当增强的方法。这些与两个新型的正规化术语结合在一起，用于政策和价值函数，以使某些参与者 - 批评算法在理论上使用数据增强所需。我们在Procgen基准上评估了我们的方法，该基准由16个程序生成的环境组成，并表明相对于标准RL算法，它将测试性能提高了约40％。我们的代理商的表现优于专门为改善RL泛化而设计的其他基线。此外，我们表明我们的代理商学习政策和表示，这些政策和表示对不影响代理（例如背景）的环境变化更为强大。我们的实施可从https://github.com/rraileanu/auto-drac获得。

Deep reinforcement learning (RL) agents often fail to generalize to unseen scenarios, even when they are trained on many instances of semantically similar environments. Data augmentation has recently been shown to improve the sample efficiency and generalization of RL agents. However, different tasks tend to benefit from different kinds of data augmentation. In this paper, we compare three approaches for automatically finding an appropriate augmentation. These are combined with two novel regularization terms for the policy and value function, required to make the use of data augmentation theoretically sound for certain actor-critic algorithms. We evaluate our methods on the Procgen benchmark which consists of 16 procedurally-generated environments and show that it improves test performance by ~40% relative to standard RL algorithms. Our agent outperforms other baselines specifically designed to improve generalization in RL. In addition, we show that our agent learns policies and representations that are more robust to changes in the environment that do not affect the agent, such as the background. Our implementation is available at https://github.com/rraileanu/auto-drac.

下载PDF全文

下载文献需遵守相关版权规定

论文标题