在A*中学习启发式方法的可区分损失功能

论文标题

在A*中学习启发式方法的可区分损失功能

A Differentiable Loss Function for Learning Heuristics in A*

论文作者

Chrestien, Leah, Pevny, Tomas, Komenda, Antonin, Edelkamp, Stefan

论文摘要

通过深度神经网络实现的A*算法的启发式函数的优化通常是通过最大程度地减少对目标值成本估算的平方根损失来完成的。本文认为，这不一定会导致对A*算法的更快搜索，因为其执行依赖于相对值而不是绝对值。作为缓解措施，我们提出了l*损失，该损失是a*搜索中过度扩展状态的数量上限。当用于优化最先进的深神经网络的L*损失，以在索科班等迷宫域中自动化计划和带有传送的迷宫，大大改善了解决问题的质量，基础计划的质量，并将扩展的状态的数量减少到大约50％

Optimization of heuristic functions for the A* algorithm, realized by deep neural networks, is usually done by minimizing square root loss of estimate of the cost to goal values. This paper argues that this does not necessarily lead to a faster search of A* algorithm since its execution relies on relative values instead of absolute ones. As a mitigation, we propose a L* loss, which upper-bounds the number of excessively expanded states inside the A* search. The L* loss, when used in the optimization of state-of-the-art deep neural networks for automated planning in maze domains like Sokoban and maze with teleports, significantly improves the fraction of solved problems, the quality of founded plans, and reduces the number of expanded states to approximately 50%

下载PDF全文

下载文献需遵守相关版权规定

论文标题