通过蒙特卡洛森林搜索的UNSAT求解器合成

论文标题

通过蒙特卡洛森林搜索的UNSAT求解器合成

UNSAT Solver Synthesis via Monte Carlo Forest Search

论文作者

Cameron, Chris, Hartford, Jason, Lundy, Taylor, Truong, Tuan, Milligan, Alan, Chen, Rex, Leyton-Brown, Kevin

论文摘要

我们介绍了用于在{TEER MDPS}中学习政策的一类增强学习算法（RL）算法的Monte Carlo Forest Search（MCFS），为此，策略执行涉及穿越指数大小的树。此类问题的例子包括证明SAT公式的不满；计算令人满意的SAT公式的解决方案的数量；并找到混合企业计划的最佳解决方案。 MCFS算法可以看作是蒙特卡洛树搜索（MCT）的扩展，而不是在树上找到一条好的路径（解决方案），而是要在候选树的森林中找到一棵小树。我们以一种算法来实例化和评估我们的想法，即我们将Knuth Synthesis（一种MCFS算法）学习，该算法学习了DPLL分支策略，以解决布尔值满意度（SAT）问题，目的是在给定的不适合问题的问题实例的给定分布上实现良好的平均案例性能。 Knuth合成是第一种RL方法，是避免在指数尺寸的树中避免政策评估的过高成本，利用两个关键思想：首先，我们通过随机采样路径和测量其长度来估算树木的大小，借鉴了由于Knuth引起的公正近似值（1975年）；其次，我们在用户定义的深度上查询一个强大的求解器，而不是在整棵树上学习政策，将我们的政策搜索集中在早期决策上，这些决策为减少树木的大小提供了最大的潜力。我们在三个众所周知的SAT分布上匹配或超过了强大的基线表现，面临的问题比以前的RL研究中提到的问题更具挑战性。

We introduce Monte Carlo Forest Search (MCFS), a class of reinforcement learning (RL) algorithms for learning policies in {tree MDPs}, for which policy execution involves traversing an exponential-sized tree. Examples of such problems include proving unsatisfiability of a SAT formula; counting the number of solutions of a satisfiable SAT formula; and finding the optimal solution to a mixed-integer program. MCFS algorithms can be seen as extensions of Monte Carlo Tree Search (MCTS) to cases where, rather than finding a good path (solution) within a tree, the problem is to find a small tree within a forest of candidate trees. We instantiate and evaluate our ideas in an algorithm that we dub Knuth Synthesis, an MCFS algorithm that learns DPLL branching policies for solving the Boolean satisfiability (SAT) problem, with the objective of achieving good average-case performance on a given distribution of unsatisfiable problem instances. Knuth Synthesis is the first RL approach to avoid the prohibitive costs of policy evaluations in an exponentially-sized tree, leveraging two key ideas: first, we estimate tree size by randomly sampling paths and measuring their lengths, drawing on an unbiased approximation due to Knuth (1975); second, we query a strong solver at a user-defined depth rather than learning a policy across the whole tree, to focus our policy search on early decisions that offer the greatest potential for reducing tree size. We matched or exceeded the performance of a strong baseline on three well-known SAT distributions, facing problems that were two orders of magnitude more challenging than those addressed in previous RL studies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题