蒙特卡洛树搜索作为正规政策优化

论文标题

蒙特卡洛树搜索作为正规政策优化

Monte-Carlo Tree Search as Regularized Policy Optimization

论文作者

Grill, Jean-Bastien, Altché, Florent, Tang, Yunhao, Hubert, Thomas, Valko, Michal, Antonoglou, Ioannis, Munos, Rémi

论文摘要

蒙特卡洛树搜索（MCT）与深度加强学习的结合已导致人工智能的重大进展。但是，当前的最新MCT算法Alphazero仍然依赖于仅部分理解的手工启发式方法。在本文中，我们表明，Alphazero的搜索启发式以及其他常见的启发式方法（例如UCT）是解决特定正规化策略优化问题的近似值。有了这个洞察力，我们提出了Alphazero的一种变体，该变体使用精确的解决方案来解决此策略优化问题，并在实验上表明它可靠地表现出多个域中的原始算法的表现。

The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristics, along with other common ones such as UCT, are an approximation to the solution of a specific regularized policy optimization problem. With this insight, we propose a variant of AlphaZero which uses the exact solution to this policy optimization problem, and show experimentally that it reliably outperforms the original algorithm in multiple domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题