论文标题
可扩展控制的最大突变增强学习
Maximum Mutation Reinforcement Learning for Scalable Control
论文作者
论文摘要
增强学习的进步(RL)以可扩展性能成本证明了数据效率和对大型状态空间的最佳控制。另一方面,遗传方法提供了可扩展性,但描述了对进化操作的超参数敏感性。但是,这两种方法的组合最近显示出在将RL剂量缩放到高维作用空间方面取得了成功。与最近的发展平行,我们介绍了基于进化的软参与者批评(ESAC),可扩展的RL算法。我们通过将进化策略(ES)与软参与者批评(SAC)相结合,从剥削中抽象探索。通过这种镜头,我们通过使用新型自动突变调谐(AMT)来利用软赢家的选择和遗传跨界,并同时提高演变中的超参数灵敏度,从而实现后代之间的主要技能转移。 AMT逐渐替换了SAC的熵框架,从而使人口在尽可能随机地进行任务的同时取得成功,而无需使用反向传播更新。在一项针对由高维操作空间和稀疏奖励组成的挑战运动任务的研究中,与最大熵框架相比,ESAC证明了性能和样本效率的提高。此外,ESAC提出了对硬件资源和算法开销的有效使用。可以在karush17.github.io/esac-web/上找到ESAC的完整实现。
Advances in Reinforcement Learning (RL) have demonstrated data efficiency and optimal control over large state spaces at the cost of scalable performance. Genetic methods, on the other hand, provide scalability but depict hyperparameter sensitivity towards evolutionary operations. However, a combination of the two methods has recently demonstrated success in scaling RL agents to high-dimensional action spaces. Parallel to recent developments, we present the Evolution-based Soft Actor-Critic (ESAC), a scalable RL algorithm. We abstract exploration from exploitation by combining Evolution Strategies (ES) with Soft Actor-Critic (SAC). Through this lens, we enable dominant skill transfer between offsprings by making use of soft winner selections and genetic crossovers in hindsight and simultaneously improve hyperparameter sensitivity in evolutions using the novel Automatic Mutation Tuning (AMT). AMT gradually replaces the entropy framework of SAC allowing the population to succeed at the task while acting as randomly as possible, without making use of backpropagation updates. In a study of challenging locomotion tasks consisting of high-dimensional action spaces and sparse rewards, ESAC demonstrates improved performance and sample efficiency in comparison to the Maximum Entropy framework. Additionally, ESAC presents efficacious use of hardware resources and algorithm overhead. A complete implementation of ESAC can be found at karush17.github.io/esac-web/.