D2C 2.0：通过无模型ILQR来控制随机非线性系统的基于数据的基于数据的方法

论文标题

D2C 2.0：通过无模型ILQR来控制随机非线性系统的基于数据的基于数据的方法

D2C 2.0: Decoupled Data-Based Approach for Learning to Control Stochastic Nonlinear Systems via Model-Free ILQR

论文作者

Parunandi, Karthikeya S, Sharma, Aayushman, Chakravorty, Suman, Kalathil, Dileep

论文摘要

在本文中，我们提出了反馈策略的结构化线性参数化，以解决无模型的随机最佳控制问题。在理论上和经验分析中，在较小的噪声假设下，在较小的噪声假设下，这种参数化证实了这种参数化，该原理被证明是几乎最佳的。此外，我们将迭代线性二次调节器（ILQR）的无模型版本纳入我们的框架中。对系统上一系列复杂性的模拟表明，所得算法能够利用ILQR的上级二阶收敛性。结果，它是快速且可扩展到各种高维系统的。比较与最先进的强化学习算法（深层确定性政策梯度（DDPG）技术）进行比较，以证明我们在培训效率方面的重要优点。

In this paper, we propose a structured linear parameterization of a feedback policy to solve the model-free stochastic optimal control problem. This parametrization is corroborated by a decoupling principle that is shown to be near-optimal under a small noise assumption, both in theory and by empirical analyses. Further, we incorporate a model-free version of the Iterative Linear Quadratic Regulator (ILQR) in a sample-efficient manner into our framework. Simulations on systems over a range of complexities reveal that the resulting algorithm is able to harness the superior second-order convergence properties of ILQR. As a result, it is fast and is scalable to a wide variety of higher dimensional systems. Comparisons are made with a state-of-the-art reinforcement learning algorithm, the Deep Deterministic Policy Gradient (DDPG) technique, in order to demonstrate the significant merits of our approach in terms of training-efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题