学习与树MDP分支

论文标题

学习与树MDP分支

Learning to branch with Tree MDPs

论文作者

Scavuzzo, Lara, Chen, Feng Yang, Chételat, Didier, Gasse, Maxime, Lodi, Andrea, Yorke-Smith, Neil, Aardal, Karen

论文摘要

最先进的混合整数线性程序（MILP）求解器将系统的树搜索与大量硬编码的启发式方法（例如分支规则）相结合。从数据中学习分支规则的想法最近受到了越来越多的关注，并且通过学习强大的分支专家的快速近似值获得了有希望的结果。在这项工作中，我们建议通过增强学习（RL）从头开始学习分支规则。我们重新审视Etheve等人的工作。（2020年）并提出了树马尔可夫决策过程或树MDP，这是时间MDP的概括，该过程为学习分支提供了更合适的框架。我们得出了树木政策梯度定理，该定理表现出与临时相比的信用分配更好的。我们通过计算实验证明，树木MDP可以改善学习融合，并为解决MILP中的学习对分支问题提供了有希望的框架。

State-of-the-art Mixed Integer Linear Program (MILP) solvers combine systematic tree search with a plethora of hard-coded heuristics, such as the branching rule. The idea of learning branching rules from data has received increasing attention recently, and promising results have been obtained by learning fast approximations of the strong branching expert. In this work, we instead propose to learn branching rules from scratch via Reinforcement Learning (RL). We revisit the work of Etheve et al. (2020) and propose tree Markov Decision Processes, or tree MDPs, a generalization of temporal MDPs that provides a more suitable framework for learning to branch. We derive a tree policy gradient theorem, which exhibits a better credit assignment compared to its temporal counterpart. We demonstrate through computational experiments that tree MDPs improve the learning convergence, and offer a promising framework for tackling the learning-to-branch problem in MILPs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题