论文标题

基于连续时间策略梯度优化结构化神经控制器

Optimisation of Structured Neural Controller Based on Continuous-Time Policy Gradient

论文作者

Cho, Namhoon, Shin, Hyo-Sang

论文摘要

这项研究为连续时间(确定性)动态系统的结构化非线性控制提供了一个政策优化框架。提出的方法根据相关的科学知识(例如Lyapunov稳定理论或领域经验)规定控制器的结构,同时考虑给定结构内的可调元素作为神经网络的参数化点。为了优化表示为神经网络权重的函数的成本,提出的方法利用基于伴随灵敏度分析的连续时间策略梯度方法作为正确和性能计算成本梯度的手段。这使得将反馈控制器的分析衍生结构的稳定性,鲁棒性和物理解释性结合在一起,以及机器学习技术提供的代表性灵活性和优化的结果性能。这种用于固定结构控制合成的混合范式对于优化适应性非线性控制器以提高在线操作中的性能特别有用,在该领域中,现有理论占据了结构的设计,同时缺乏对收益的调整和不确定性模型基础函数的明确分析理解来控制绩效特征。航空应用上的数值实验说明了结构化非线性控制器优化框架的实用性。

This study presents a policy optimisation framework for structured nonlinear control of continuous-time (deterministic) dynamic systems. The proposed approach prescribes a structure for the controller based on relevant scientific knowledge (such as Lyapunov stability theory or domain experiences) while considering the tunable elements inside the given structure as the point of parametrisation with neural networks. To optimise a cost represented as a function of the neural network weights, the proposed approach utilises the continuous-time policy gradient method based on adjoint sensitivity analysis as a means for correct and performant computation of cost gradient. This enables combining the stability, robustness, and physical interpretability of an analytically-derived structure for the feedback controller with the representational flexibility and optimised resulting performance provided by machine learning techniques. Such a hybrid paradigm for fixed-structure control synthesis is particularly useful for optimising adaptive nonlinear controllers to achieve improved performance in online operation, an area where the existing theory prevails the design of structure while lacking clear analytical understandings about tuning of the gains and the uncertainty model basis functions that govern the performance characteristics. Numerical experiments on aerospace applications illustrate the utility of the structured nonlinear controller optimisation framework.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源