鼓励协同行为的内在动机

论文标题

鼓励协同行为的内在动机

Intrinsic Motivation for Encouraging Synergistic Behavior

论文作者

Chitnis, Rohan, Tulsiani, Shubham, Gupta, Saurabh, Gupta, Abhinav

论文摘要

我们研究了内在动机作为在稀疏奖励协同任务中增强学习的探索偏见的作用，这些任务是多个代理必须共同努力以实现无法单独实现目标的任务。我们的关键思想是，在协同任务中的内在动机的良好指导原则是采取行动，以影响世界的方式，如果代理人自己行事，这将无法实现。因此，我们建议激励代理采取（联合）作用，而这些作用无法通过对每个剂的预测效果组成来预测其效果。我们研究了这一想法的两个实例，一个基于遇到的真实状态，另一个基于与该政策同时培训的动态模型。虽然前者更简单，但后者的好处是相对于所采取的动作有分析可区分。我们在机器人双人的操作和具有稀疏奖励的多代理运动任务中验证了我们的方法；我们发现，我们的方法比两者都具有更有效的学习能力1）仅需稀疏奖励和2）使用典型的基于意外动机的表述，这并不偏向于协同行为。可以在项目网页上找到视频：https：//sites.google.com/view/iclr202020-synergistic。

We study the role of intrinsic motivation as an exploration bias for reinforcement learning in sparse-reward synergistic tasks, which are tasks where multiple agents must work together to achieve a goal they could not individually. Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own. Thus, we propose to incentivize agents to take (joint) actions whose effects cannot be predicted via a composition of the predicted effect for each individual agent. We study two instantiations of this idea, one based on the true states encountered, and another based on a dynamics model trained concurrently with the policy. While the former is simpler, the latter has the benefit of being analytically differentiable with respect to the action taken. We validate our approach in robotic bimanual manipulation and multi-agent locomotion tasks with sparse rewards; we find that our approach yields more efficient learning than both 1) training with only the sparse reward and 2) using the typical surprise-based formulation of intrinsic motivation, which does not bias toward synergistic behavior. Videos are available on the project webpage: https://sites.google.com/view/iclr2020-synergistic.

下载PDF全文

下载文献需遵守相关版权规定

论文标题