正则随机控制问题的梯度流动

论文标题

正则随机控制问题的梯度流动

Gradient Flows for Regularized Stochastic Control Problems

论文作者

Šiška, David, Szpruch, Łukasz

论文摘要

本文研究了随机控制问题，其动作空间是概率措施，其客观受到相对熵的惩罚。我们确定了合适的度量空间，在该空间上，我们在该量度值控制过程中为量度值构建了梯度流，并在一组可接受的控件中确保了成本功能降低。结果表明，该梯度流的任何不变度均可满足蓬松蛋白的最佳原理。如果我们解决的问题足够凸，则梯度流量成倍地收敛。此外，最佳测量值的控制过程承认了贝叶斯的解释，这意味着在解决此类随机控制问题时，可以纳入先验知识。这项工作是出于渴望扩大理论基础的愿望，即在强化学习社区中广泛使用的随机梯度类型算法的融合来解决控制问题。

This paper studies stochastic control problems with the action space taken to be probability measures, with the objective penalised by the relative entropy. We identify suitable metric space on which we construct a gradient flow for the measure-valued control process, in the set of admissible controls, along which the cost functional is guaranteed to decrease. It is shown that any invariant measure of this gradient flow satisfies the Pontryagin optimality principle. If the problem we work with is sufficiently convex, the gradient flow converges exponentially fast. Furthermore, the optimal measure-valued control process admits a Bayesian interpretation which means that one can incorporate prior knowledge when solving such stochastic control problems. This work is motivated by a desire to extend the theoretical underpinning for the convergence of stochastic gradient type algorithms widely employed in the reinforcement learning community to solve control problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题