Hamilton-Jacobi深Q学习，用于确定性连续时间系统，具有Lipschitz连续控制

论文标题

Hamilton-Jacobi深Q学习，用于确定性连续时间系统，具有Lipschitz连续控制

Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time Systems with Lipschitz Continuous Controls

论文作者

Kim, Jeongho, Shin, Jaeuk, Yang, Insoon

论文摘要

在本文中，我们提出了Q学习算法，用于Lipschitz连续控制的连续时间确定性最佳控制问题。我们的方法是基于新的汉密尔顿 - 雅各比 - 贝尔曼（HJB）方程，这些方程是从将动态编程原理应用于连续的Q-函数中得出的。提出了一种新型的半混凝土版本的HJB方程，以设计一种Q学习算法，该算法使用离散时间内收集的数据而无需离散或近似系统动力学。我们确定该算法估计的Q功能会收敛到最佳Q功能的条件。为了实际实施，我们提出了Hamilton-Jacobi DQN，将深Q-Networks（DQN）的思想扩展到我们的持续控制设置。这种方法不需要参与者网络或数值解决方案来解决贪婪动作的优化问题，因为HJB方程通过普通微分方程提供了对最佳控件的简单表征。我们通过基准任务和高维线性季度问题来证明方法的性能。

In this paper, we propose Q-learning algorithms for continuous-time deterministic optimal control problems with Lipschitz continuous controls. Our method is based on a new class of Hamilton-Jacobi-Bellman (HJB) equations derived from applying the dynamic programming principle to continuous-time Q-functions. A novel semi-discrete version of the HJB equation is proposed to design a Q-learning algorithm that uses data collected in discrete time without discretizing or approximating the system dynamics. We identify the condition under which the Q-function estimated by this algorithm converges to the optimal Q-function. For practical implementation, we propose the Hamilton-Jacobi DQN, which extends the idea of deep Q-networks (DQN) to our continuous control setting. This approach does not require actor networks or numerical solutions to optimization problems for greedy actions since the HJB equation provides a simple characterization of optimal controls via ordinary differential equations. We empirically demonstrate the performance of our method through benchmark tasks and high-dimensional linear-quadratic problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题