通过分析策略梯度培训有效的控制器

论文标题

通过分析策略梯度培训有效的控制器

Training Efficient Controllers via Analytic Policy Gradient

论文作者

Wiedemann, Nina, Wüest, Valentin, Loquercio, Antonio, Müller, Matthias, Floreano, Dario, Scaramuzza, Davide

论文摘要

机器人系统的控制设计很复杂，通常需要解决优化才能准确遵循轨迹。在线优化方法（例如模型预测控制（MPC））已被证明可以实现出色的跟踪性能，但需要高计算能力。相反，基于学习的离线优化方法，例如加固学习（RL），可以在机器人上快速有效地执行，但几乎不匹配MPC在轨迹跟踪任务中的准确性。在具有有限计算的系统（例如航空车）中，必须在执行时间有效的精确控制器。我们提出了一种分析策略梯度（APG）方法来解决此问题。 APG通过在跟踪误差上以梯度下降来训练控制器，从而利用可区分模拟器的可用性。我们解决了通过课程学习和实验经常在广泛使用的控制基准，Cartpole和两个常见的空中机器人，一个四极管和固定翼无人机上进行的训练不稳定性。在跟踪误差方面，我们提出的方法优于基于模型和无模型的RL方法。同时，它的性能与MPC相似，同时需要少于数量级的计算时间。我们的工作提供了对APG作为机器人技术有前途的控制方法的潜力的见解。为了促进APG的探索，我们开放代码并在https://github.com/lis-epfl/apg_traimptory_tracking上提供。

Control design for robotic systems is complex and often requires solving an optimization to follow a trajectory accurately. Online optimization approaches like Model Predictive Control (MPC) have been shown to achieve great tracking performance, but require high computing power. Conversely, learning-based offline optimization approaches, such as Reinforcement Learning (RL), allow fast and efficient execution on the robot but hardly match the accuracy of MPC in trajectory tracking tasks. In systems with limited compute, such as aerial vehicles, an accurate controller that is efficient at execution time is imperative. We propose an Analytic Policy Gradient (APG) method to tackle this problem. APG exploits the availability of differentiable simulators by training a controller offline with gradient descent on the tracking error. We address training instabilities that frequently occur with APG through curriculum learning and experiment on a widely used controls benchmark, the CartPole, and two common aerial robots, a quadrotor and a fixed-wing drone. Our proposed method outperforms both model-based and model-free RL methods in terms of tracking error. Concurrently, it achieves similar performance to MPC while requiring more than an order of magnitude less computation time. Our work provides insights into the potential of APG as a promising control method for robotics. To facilitate the exploration of APG, we open-source our code and make it available at https://github.com/lis-epfl/apg_trajectory_tracking.

下载PDF全文

下载文献需遵守相关版权规定

论文标题