多代理增强学习中的无模型惯例具有异质偏好

论文标题

多代理增强学习中的无模型惯例具有异质偏好

Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences

论文作者

Köster, Raphael, McKee, Kevin R., Everett, Richard, Weidinger, Laura, Isaac, William S., Hughes, Edward, Duéñez-Guzmán, Edgar A., Graepel, Thore, Botvinick, Matthew, Leibo, Joel Z.

论文摘要

惯例的游戏理论观点通常取决于常识的概念和个人行为的超理性模型。但是，行为经济学数十年的工作质疑了这两个基金会的有效性。同时，计算神经科学为决策提供了现代化的“双重过程”，其中无模型（MF）强化学习与基于模型（MB）的强化学习进行了交易。前者捕获了习惯和程序学习，而后者则通过明确的计划和推论捕获了选择。认知与游戏理论和MB帐户产生共鸣的认知可能支持一些惯例（例如国际条约）。但是，公约形成也可能通过习惯学习等MF机制发生。尽管这种可能性已经被研究了。在这里，我们证明了MF学习机制可以出现复杂的大规模惯例。这表明某些惯例可能由习惯般的认知而不是明确的推理来支持。我们将MF多代理增强学习应用于具有不完整信息的颞空间扩展游戏。在此游戏中，只有通过集体行动才能达到状态空间的大部分。但是，品味的异质性使这种协调的动作变得困难：所有玩家都需要多个平衡，但是亚组比所有其他玩家都更喜欢特定的平衡。这会产生一个可以通过建立公约来解决的协调问题。我们研究了启动和自由骑手子问题，以及群体大小，内在偏好强度以及显着性对协调惯例出现动力学的影响。我们的模拟结果表明，代理在公约之间建立和转变，即使在这样做的必要时，也需要与自己的首选结果相反。

Game theoretic views of convention generally rest on notions of common knowledge and hyper-rational models of individual behavior. However, decades of work in behavioral economics have questioned the validity of both foundations. Meanwhile, computational neuroscience has contributed a modernized 'dual process' account of decision-making where model-free (MF) reinforcement learning trades off with model-based (MB) reinforcement learning. The former captures habitual and procedural learning while the latter captures choices taken via explicit planning and deduction. Some conventions (e.g. international treaties) are likely supported by cognition that resonates with the game theoretic and MB accounts. However, convention formation may also occur via MF mechanisms like habit learning; though this possibility has been understudied. Here, we demonstrate that complex, large-scale conventions can emerge from MF learning mechanisms. This suggests that some conventions may be supported by habit-like cognition rather than explicit reasoning. We apply MF multi-agent reinforcement learning to a temporo-spatially extended game with incomplete information. In this game, large parts of the state space are reachable only by collective action. However, heterogeneity of tastes makes such coordinated action difficult: multiple equilibria are desirable for all players, but subgroups prefer a particular equilibrium over all others. This creates a coordination problem that can be solved by establishing a convention. We investigate start-up and free rider subproblems as well as the effects of group size, intensity of intrinsic preference, and salience on the emergence dynamics of coordination conventions. Results of our simulations show agents establish and switch between conventions, even working against their own preferred outcome when doing so is necessary for effective coordination.

下载PDF全文

下载文献需遵守相关版权规定

论文标题