监视逃避游戏的可见性优化

论文标题

监视逃避游戏的可见性优化

Visibility Optimization for Surveillance-Evasion Games

论文作者

Ly, Louis, Tsai, Yen-Hsi Richard

论文摘要

我们考虑监视逃避差异游戏，追随者必须试图不断地保持移动逃避者的可见性。一旦逃避者被遮挡，追捕者就会失败。游戏的最佳控件可以作为汉密尔顿 - 雅各布-ISAAC方程式配方。我们使用上风方案来计算反馈值函数，这对应于差分游戏的最后游戏时间。尽管该值函数可以实现最佳控件，但即使对于一个小网格上的一个追求者和单一的逃避器，它也非常昂贵。我们考虑监视游戏的离散变体。我们提出了两种基于与多个追随者和逃避者的监视远程远程远程远程远程远程效果函数的本地最佳策略。我们表明，蒙特卡洛树搜索和自我游戏增强学习可以训练深层神经网络，以制定在线游戏游戏的合理策略。鉴于足够的计算资源和离线培训时间，提议的模型可以继续改善其政策，并有效地扩展到更高的决议。

We consider surveillance-evasion differential games, where a pursuer must try to constantly maintain visibility of a moving evader. The pursuer loses as soon as the evader becomes occluded. Optimal controls for game can be formulated as a Hamilton-Jacobi-Isaac equation. We use an upwind scheme to compute the feedback value function, corresponding to the end-game time of the differential game. Although the value function enables optimal controls, it is prohibitively expensive to compute, even for a single pursuer and single evader on a small grid. We consider a discrete variant of the surveillance-game. We propose two locally optimal strategies based on the static value function for the surveillance-evasion game with multiple pursuers and evaders. We show that Monte Carlo tree search and self-play reinforcement learning can train a deep neural network to generate reasonable strategies for on-line game play. Given enough computational resources and offline training time, the proposed model can continue to improve its policies and efficiently scale to higher resolutions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题