焦躁不安的多小匪徒游戏中多代理的最佳学习动力学

论文标题

焦躁不安的多小匪徒游戏中多代理的最佳学习动力学

Optimal Learning Dynamics of Multi Agents in Restless Multiarmed Bandit Game

论文作者

Nakayama, Kazuaki, Nakamura, Ryuzo, Hisakado, Masato, Mori, Shintaro

论文摘要

社会学习是通过观察或与其他人的互动来学习。这对于理解社会物理中人类集体行为至关重要。我们研究了不安的多型强盗（RMAB）中代理的学习过程。每个手臂的二进制收益随机变化，并且代理通过用回报1利用手臂，随机搜索手臂（个人学习）或复制其他特工（社会学习）利用的手臂来最大程度地发挥其回报。该系统在社会学习和个人学习的混合战略空间中具有帕累托和纳什平衡。我们研究了几种模型，其中代理在策略空间中最大程度地提高了他们的预期收益，并在分析和数值上证明了系统趋于平衡。我们还进行了一个实验，并研究了人类参与者是否采用了最佳策略。在这个实验中，三名参与者玩游戏。如果每个群体的奖励与收益的总和成正比，那么社会学习率的中位数几乎与帕累托平衡相吻合。

Social learning is learning through the observation of or interaction with other individuals; it is critical in the understanding of the collective behaviors of humans in social physics. We study the learning process of agents in a restless multiarmed bandit (rMAB). The binary payoff of each arm changes randomly and agents maximize their payoffs by exploiting an arm with payoff 1, searching the arm at random (individual learning), or copying an arm exploited by other agents (social learning). The system has Pareto and Nash equilibria in the mixed strategy space of social and individual learning. We study several models in which agents maximize their expected payoffs in the strategy space, and demonstrate analytically and numerically that the system converges to the equilibria. We also conducted an experiment and investigated whether human participants adopt the optimal strategy. In this experiment, three participants play the game. If the reward of each group is proportional to the sum of the payoffs, the median of the social learning rate almost coincides with that of the Pareto equilibrium.

下载PDF全文

下载文献需遵守相关版权规定

论文标题