论文标题
我的公平强盗:用多人匪徒的最大敏公平分布式学习
My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits
论文作者
论文摘要
考虑n合作但不交流的球员,每个玩家都会转弯一臂之力。玩家对每个手臂都有不同的实用程序,可表示为NXM矩阵。这些公用事业是玩家未知的。在每个回合中,玩家选择了一只手臂,并为此嘈杂地观察他们的实用程序。但是,如果其他任何玩家选择了相同的臂,那么由于冲突,所有碰撞的球员都将获得零用性。玩家之间没有其他交流或协调。我们的目标是设计一种分布式的算法,该算法了解球员和武器之间的匹配,从而在最大程度地减少遗憾的同时,可以实现Max-Min公平。我们提出了一种算法,并证明了遗憾的是,最佳选择到$ \ log \ log t $ factor。这是第一个具有(近)订单最佳遗憾的Max-Min公平多人强盗算法。
Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, representable as an NxM matrix. These utilities are unknown to the players. In each turn players select an arm and receive a noisy observation of their utility for it. However, if any other players selected the same arm that turn, all colliding players will all receive zero utility due to the conflict. No other communication or coordination between the players is possible. Our goal is to design a distributed algorithm that learns the matching between players and arms that achieves max-min fairness while minimizing the regret. We present an algorithm and prove that it is regret optimal up to a $\log\log T$ factor. This is the first max-min fairness multi-player bandit algorithm with (near) order optimal regret.