论文标题
小组模棱两可的深钢筋学习
Group Equivariant Deep Reinforcement Learning
论文作者
论文摘要
在强化学习(RL)中,卷积神经网络(CNN)已成功地用作深Q学习算法中的函数近似值,该算法试图在各种环境中学习动作值功能和策略。但是,迄今为止,在学习对称环境状态状态的对称性转化模棱两可的表示方面几乎没有工作。在本文中,我们提出使用均衡的CNN来训练RL药物并研究其感应偏置以转化Q值近似。我们证明,在高度对称环境中,同时需要更少的参数,可以显着提高RL剂的性能和样品效率。此外,我们表明它们对仿射转化引起的环境变化是强大的。
In Reinforcement Learning (RL), Convolutional Neural Networks(CNNs) have been successfully applied as function approximators in Deep Q-Learning algorithms, which seek to learn action-value functions and policies in various environments. However, to date, there has been little work on the learning of symmetry-transformation equivariant representations of the input environment state. In this paper, we propose the use of Equivariant CNNs to train RL agents and study their inductive bias for transformation equivariant Q-value approximation. We demonstrate that equivariant architectures can dramatically enhance the performance and sample efficiency of RL agents in a highly symmetric environment while requiring fewer parameters. Additionally, we show that they are robust to changes in the environment caused by affine transformations.