通过软数据增强进行加强学习的概括

论文标题

通过软数据增强进行加强学习的概括

Generalization in Reinforcement Learning by Soft Data Augmentation

论文作者

Hansen, Nicklas, Wang, Xiaolong

论文摘要

通过域随机化和数据增强，已做出了广泛的努力来提高加强学习（RL）方法的概括能力。但是，随着在训练期间引入更多的变异因素，优化变得越来越具有挑战性，并且从经验上可能导致样本效率降低和不稳定的培训。我们提出了软数据增强（SODA），而不是直接从增强数据中学习政策，这种方法是将增强措施与政策学习相结合的方法。具体而言，苏打水对编码器施加了软限制，旨在最大化增强和非夸大数据的潜在表示之间的相互信息，而RL优化过程则使用严格的非官能数据。对来自DeepMind Control Suite以及机器人操纵任务的各种任务进行了经验评估，我们发现苏打水可以显着提高样品效率，概括和稳定性，从而在基于最先进的视力RL方法的培训中培训。

Extensive efforts have been made to improve the generalization ability of Reinforcement Learning (RL) methods via domain randomization and data augmentation. However, as more factors of variation are introduced during training, optimization becomes increasingly challenging, and empirically may result in lower sample efficiency and unstable training. Instead of learning policies directly from augmented data, we propose SOft Data Augmentation (SODA), a method that decouples augmentation from policy learning. Specifically, SODA imposes a soft constraint on the encoder that aims to maximize the mutual information between latent representations of augmented and non-augmented data, while the RL optimization process uses strictly non-augmented data. Empirical evaluations are performed on diverse tasks from DeepMind Control suite as well as a robotic manipulation task, and we find SODA to significantly advance sample efficiency, generalization, and stability in training over state-of-the-art vision-based RL methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题