论文标题
在贝叶斯游戏中对学习者进行战略
Strategizing against Learners in Bayesian Games
论文作者
论文摘要
我们研究了重复的两人游戏,其中一位学习者(学习者)采用了无需重新学习策略,而另一个则是优化器,是一个理性的实用程序最大化器。我们考虑一般的贝叶斯游戏,其中优化器和学习者的收益都可以取决于类型,该类型是从公开已知的分布中得出的,但私下向学习者透露。我们解决以下问题:(a)优化器可以保证获得的最低限度是什么,无论学习者使用的无重格学习算法是什么? (b)是否有学习算法以最低的限制限制优化器的收益? (c)这些算法可以有效地实现吗?在构建优化器学习者相互作用的理论时,我们定义了一种新的遗憾组合概念,称为Polytope交换后悔,这可能在其他环境中具有独立的兴趣。
We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy, while the other, the optimizer, is a rational utility maximizer. We consider general Bayesian games, where the payoffs of both the optimizer and the learner could depend on the type, which is drawn from a publicly known distribution, but revealed privately to the learner. We address the following questions: (a) what is the bare minimum that the optimizer can guarantee to obtain regardless of the no-regret learning algorithm employed by the learner? (b) are there learning algorithms that cap the optimizer payoff at this minimum? (c) can these algorithms be implemented efficiently? While building this theory of optimizer-learner interactions, we define a new combinatorial notion of regret called polytope swap regret, that could be of independent interest in other settings.