一夫多妻制：改进的零学习

论文标题

一夫多妻制：改进的零学习

Polygames: Improved Zero Learning

论文作者

Cazenave, Tristan, Chen, Yen-Chi, Chen, Guan-Wei, Chen, Shi-Yu, Chiu, Xian-Dong, Dehos, Julien, Elsa, Maria, Gong, Qucheng, Hu, Hengyuan, Khalidov, Vasil, Li, Cheng-Ling, Lin, Hsin-I, Lin, Yu-Jin, Martinet, Xavier, Mella, Vegard, Rapin, Jeremy, Roziere, Baptiste, Synnaeve, Gabriel, Teytaud, Fabien, Teytaud, Olivier, Ye, Shi-Cheng, Ye, Yi-Jun, Yen, Shi-Jim, Zagoruyko, Sergey

论文摘要

由于DeepMind的Alphazero，零学习很快成为许多棋盘游戏的最新方法。可以使用完全卷积的结构（无完全连接的层）改进它。使用这样的体系结构加上全球合并，我们可以创建独立于董事会尺寸的机器人。通过跟踪培训期间的最佳检查站和对他们进行培训，可以使培训更加强大。使用这些功能，我们发布了一条多重法，我们的游戏库及其检查站。我们在19x19的十六进制比赛中赢得了强者的胜利，通常据说这对于零学习是不可收缩的。在哈万纳。我们还在Taai比赛中赢得了几个第一名。

Since DeepMind's AlphaZero, Zero learning quickly became the state-of-the-art method for many board games. It can be improved using a fully convolutional structure (no fully connected layer). Using such an architecture plus global pooling, we can create bots independent of the board size. The training can be made more robust by keeping track of the best checkpoints during the training and by training against them. Using these features, we release Polygames, our framework for Zero learning, with its library of games and its checkpoints. We won against strong humans at the game of Hex in 19x19, which was often said to be untractable for zero learning; and in Havannah. We also won several first places at the TAAI competitions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题