反复拍卖中无regret招标算法的收敛分析

论文标题

反复拍卖中无regret招标算法的收敛分析

Convergence Analysis of No-Regret Bidding Algorithms in Repeated Auctions

论文作者

Feng, Zhe, Guruganesh, Guru, Liaw, Christopher, Mehta, Aranyak, Sethi, Abhishek

论文摘要

在文献中已经广泛研究了游戏与无重格算法之间的联系。一个基本的结果是，当所有玩家都采用无重组策略时，这会产生一系列动作，其时间平均水平是游戏的粗略平衡。但是，对于存在多个均衡的情况，关于平衡选择的知之甚少。在这项工作中，我们研究了拍卖中无重组招标算法的收敛性。除了具有理论上的兴趣外，拍卖中的招标动态也是一个重要的问题。我们研究竞标者之间的重复游戏，其中在每个时间步骤中出售一个项目，而投标人的价值是从未知分布中得出的。我们表明，如果投标人使用任何基于均值的学习规则，那么投标人在第二次价格拍卖中以高可能性收敛于真实的纯纳什均衡，而在多锁定环境中的VCG拍卖中，以及在第一笔价格拍卖中与贝叶斯nash均衡。我们注意到，基于平均值的算法涵盖了多种已知的NO-Regret算法，例如EXP3，UCB，$ε$ -Greedy等。而且，我们分析了这种学习算法所产生的个体迭代的收敛性，与序列的时间平均水平相对。我们的实验证实了我们的理论发现，当我们使用其他策略（例如深Q学习）时，也发现了类似的融合。

The connection between games and no-regret algorithms has been widely studied in the literature. A fundamental result is that when all players play no-regret strategies, this produces a sequence of actions whose time-average is a coarse-correlated equilibrium of the game. However, much less is known about equilibrium selection in the case that multiple equilibria exist. In this work, we study the convergence of no-regret bidding algorithms in auctions. Besides being of theoretical interest, bidding dynamics in auctions is an important question from a practical viewpoint as well. We study repeated game between bidders in which a single item is sold at each time step and the bidder's value is drawn from an unknown distribution. We show that if the bidders use any mean-based learning rule then the bidders converge with high probability to the truthful pure Nash Equilibrium in a second price auction, in VCG auction in the multi-slot setting and to the Bayesian Nash equilibrium in a first price auction. We note mean-based algorithms cover a wide variety of known no-regret algorithms such as Exp3, UCB, $ε$-Greedy etc. Also, we analyze the convergence of the individual iterates produced by such learning algorithms, as opposed to the time-average of the sequence. Our experiments corroborate our theoretical findings and also find a similar convergence when we use other strategies such as Deep Q-Learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题