深度RL具有信息限制的政策：连续控制中的概括

论文标题

深度RL具有信息限制的政策：连续控制中的概括

Deep RL With Information Constrained Policies: Generalization in Continuous Control

论文作者

Malloy, Tailia, Sims, Chris R., Klinger, Tim, Liu, Miao, Riemer, Matthew, Tesauro, Gerald

论文摘要

尽管处理和存储信息的能力高度有限，但生物制剂尽管具有高度有限的能力，但智能地学习和行动。许多现实世界中的问题都涉及持续控制，这代表了人工智能代理的一项艰巨任务。在本文中，我们探讨了潜在的学习优势，对信息流的自然限制可能会授予连续控制任务中的人造代理。我们专注于无模型的增强学习（RL）设置，并根据信息理论的限制对学习政策的复杂性进行形式化。我们表明，我们的方法是从利率延伸理论的应用中以原则性的方式出现的。我们实施了一种新颖的能力有限的参与者批评（CLAC）算法，并将其置于更广泛的RL算法家族中，例如软演员评论家（SAC）和相互信息加强学习（MIRL）算法。我们使用连续控制任务的实验表明，与替代方法相比，CLAC在训练和修改的测试环境之间提供了概括的改进。这是在CLAC模型中实现的，同时显示出相似方法的高样本效率。

Biological agents learn and act intelligently in spite of a highly limited capacity to process and store information. Many real-world problems involve continuous control, which represents a difficult task for artificial intelligence agents. In this paper we explore the potential learning advantages a natural constraint on information flow might confer onto artificial agents in continuous control tasks. We focus on the model-free reinforcement learning (RL) setting and formalize our approach in terms of an information-theoretic constraint on the complexity of learned policies. We show that our approach emerges in a principled fashion from the application of rate-distortion theory. We implement a novel Capacity-Limited Actor-Critic (CLAC) algorithm and situate it within a broader family of RL algorithms such as the Soft Actor Critic (SAC) and Mutual Information Reinforcement Learning (MIRL) algorithm. Our experiments using continuous control tasks show that compared to alternative approaches, CLAC offers improvements in generalization between training and modified test environments. This is achieved in the CLAC model while displaying the high sample efficiency of similar methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题