论文标题

通过预测的Blackwell的可接近性来解决更快的游戏解决:连接遗憾匹配和镜子下降

Faster Game Solving via Predictive Blackwell Approachability: Connecting Regret Matching and Mirror Descent

论文作者

Farina, Gabriele, Kroer, Christian, Sandholm, Tuomas

论文摘要

Blackwell的可允许性是用于推理具有矢量值得回报的重复游戏的框架。我们介绍了预测性Blackwell的可接近性,其中给出了下一个回报向量的估计,并且决策者试图根据该估计器的准确性来实现更好的性能。为了得出获得预测性Blackwell可接近性的算法,我们首先显示了四种知名算法之间的强大联系。遵循规范化领导者(FTRL)和在线镜像下降(OMD)是在线凸优化中最普遍的遗憾最小化器。尽管存在这种流行,但在解决大规模游戏的实践中,人们首选遗憾的匹配(RM)和遗憾匹配+(RM+)算法(因为当地的后悔最小化的人在反事实遗憾的最小化框架中)。我们证明RM和RM+是运行FTRL和OMD所产生的算法,以在基础的Blackwell可接近性游戏中始终选择半空间。通过将FTRL或OMD的预测变体应用于此连接,我们获得了Blackwell的预测可接近性算法以及RM和RM+的预测变体。在18个常见的零和大型基准游戏的实验中,我们表明,预测性RM+以及反事实后的遗憾最小化比以前最快的算法(CFR+,DCFR,LCFR,LCFR)更快地收敛速度,但在所有扑克游戏中以外的两个游戏中,有时是两个或多个巨大的次数。

Blackwell approachability is a framework for reasoning about repeated games with vector-valued payoffs. We introduce predictive Blackwell approachability, where an estimate of the next payoff vector is given, and the decision maker tries to achieve better performance based on the accuracy of that estimator. In order to derive algorithms that achieve predictive Blackwell approachability, we start by showing a powerful connection between four well-known algorithms. Follow-the-regularized-leader (FTRL) and online mirror descent (OMD) are the most prevalent regret minimizers in online convex optimization. In spite of this prevalence, the regret matching (RM) and regret matching+ (RM+) algorithms have been preferred in the practice of solving large-scale games (as the local regret minimizers within the counterfactual regret minimization framework). We show that RM and RM+ are the algorithms that result from running FTRL and OMD, respectively, to select the halfspace to force at all times in the underlying Blackwell approachability game. By applying the predictive variants of FTRL or OMD to this connection, we obtain predictive Blackwell approachability algorithms, as well as predictive variants of RM and RM+. In experiments across 18 common zero-sum extensive-form benchmark games, we show that predictive RM+ coupled with counterfactual regret minimization converges vastly faster than the fastest prior algorithms (CFR+, DCFR, LCFR) across all games but two of the poker games, sometimes by two or more orders of magnitude.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源