论文标题
学习模型预测控制的高级政策
Learning High-Level Policies for Model Predictive Control
论文作者
论文摘要
政策搜索和深度神经网络的结合具有自动化各种决策任务的希望。模型预测控制(MPC)通过利用系统的动力模型并在短计划中在线解决优化问题,为机器人控制任务提供了强大的解决方案。在这项工作中,我们利用概率决策方法和人工神经网络的概括能力来通过学习MPC(High MPC)的高级高级政策来实现强大的在线优化。对机器人本地观察的条件,受过训练的神经网络策略能够适应为低级MPC控制器选择高级决策变量,然后为机器人生成最佳控制命令。首先,我们将对MPC的高级决策变量搜索作为策略搜索问题,特别是概率推断问题。该问题可以在封闭式解决方案中解决。其次,我们提出了一种自我监督的学习算法,用于学习神经网络高级策略,这对于在高度动态的环境中对在线超参数改编很有用。我们通过使用拟议的方法解决了一个具有挑战性的控制问题,将在线适应纳入自主机器人的重要性,该任务是控制模拟的四极管以飞行摇摆门。我们表明,我们的方法可以处理标准MPC困难的情况。
The combination of policy search and deep neural networks holds the promise of automating a variety of decision-making tasks. Model Predictive Control (MPC) provides robust solutions to robot control tasks by making use of a dynamical model of the system and solving an optimization problem online over a short planning horizon. In this work, we leverage probabilistic decision-making approaches and the generalization capability of artificial neural networks to the powerful online optimization by learning a deep high-level policy for the MPC (High-MPC). Conditioning on robot's local observations, the trained neural network policy is capable of adaptively selecting high-level decision variables for the low-level MPC controller, which then generates optimal control commands for the robot. First, we formulate the search of high-level decision variables for MPC as a policy search problem, specifically, a probabilistic inference problem. The problem can be solved in a closed-form solution. Second, we propose a self-supervised learning algorithm for learning a neural network high-level policy, which is useful for online hyperparameter adaptations in highly dynamic environments. We demonstrate the importance of incorporating the online adaption into autonomous robots by using the proposed method to solve a challenging control problem, where the task is to control a simulated quadrotor to fly through a swinging gate. We show that our approach can handle situations that are difficult for standard MPC.