论文标题
非动力学增强学习:与域分类器进行转移的培训
Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
论文作者
论文摘要
我们提出了一种简单,实用和直观的方法,用于在增强学习中适应领域。我们的方法源于这样一种观念,即代理商在源域中的经验应该看起来与其在目标域中的经验相似。在RL的概率观点的基础上,我们正式表明我们可以通过修改奖励功能来补偿动态差异来实现这一目标。通过学习将源域转变与目标域转变区分开的辅助分类器,可以简单地估计奖励函数。凭直觉,修改后的奖励功能会惩罚代理商访问状态并在源域中采取行动,而目标域则是不可能的。换句话说,代理因过渡而受到惩罚,该过渡表明代理人正在与源域而不是目标域进行交互。我们的方法适用于具有连续状态和行动的域,并且不需要学习动态的明确模型。在离散和连续的控制任务上,我们说明了方法的机制,并证明了其对高维任务的可扩展性。
We propose a simple, practical, and intuitive approach for domain adaptation in reinforcement learning. Our approach stems from the idea that the agent's experience in the source domain should look similar to its experience in the target domain. Building off of a probabilistic view of RL, we formally show that we can achieve this goal by compensating for the difference in dynamics by modifying the reward function. This modified reward function is simple to estimate by learning auxiliary classifiers that distinguish source-domain transitions from target-domain transitions. Intuitively, the modified reward function penalizes the agent for visiting states and taking actions in the source domain which are not possible in the target domain. Said another way, the agent is penalized for transitions that would indicate that the agent is interacting with the source domain, rather than the target domain. Our approach is applicable to domains with continuous states and actions and does not require learning an explicit model of the dynamics. On discrete and continuous control tasks, we illustrate the mechanics of our approach and demonstrate its scalability to high-dimensional tasks.