论文标题
Quadcopter UAV的控制政策的发展加强学习与推力矢量转子
Developmental Reinforcement Learning of Control Policy of a Quadcopter UAV with Thrust Vectoring Rotors
论文作者
论文摘要
在本文中,我们为具有推力矢量功能的四轮驱动器提供了一种新颖的发展增强学习控制器。这种多电动无人机设计具有倾斜的转子。它利用转子力的幅度和方向在飞行过程中达到所需状态。该机器人的控制策略是使用从Quadcopter学习的策略传输(相对简单的无界线设计)中学到的。这种方法允许学习具有多个输入和多个输出的系统的控制策略。通过基于物理学的模拟来评估学习政策的性能,以实现悬停和途径导航的任务。飞行模拟使用基于强化学习的飞行控制器,而没有任何其他PID组件。结果表明,通过呈现的方法,学习速度更快,而不是从头开始学习控制策略的这种新的无人机设计,该设计是由常规Quadcopter中的修改创建的,即增加了更多的自由度(常规Quadcopter中的4个实用器中的4个eactuators to tilt-Rotor-rotor Quadcopter中的8-Ectuators)。我们通过在模拟中从各种非静态初始条件中展示倾斜 - 旋转平台的恢复,以达到所需状态,从而证明了我们学到的政策的鲁棒性。与从刮擦中学到的政策相比,倾斜旋转无人机的发展政策也显示出较高的容错。结果表明,提出的方法可以从更简单的系统(较低维操作空间)到更复杂的机器人(相对较高维度的动作空间)引导学习行为的能力,并更快地达到更好的性能。
In this paper, we present a novel developmental reinforcement learning-based controller for a quadcopter with thrust vectoring capabilities. This multirotor UAV design has tilt-enabled rotors. It utilizes the rotor force magnitude and direction to achieve the desired state during flight. The control policy of this robot is learned using the policy transfer from the learned controller of the quadcopter (comparatively simple UAV design without thrust vectoring). This approach allows learning a control policy for systems with multiple inputs and multiple outputs. The performance of the learned policy is evaluated by physics-based simulations for the tasks of hovering and way-point navigation. The flight simulations utilize a flight controller based on reinforcement learning without any additional PID components. The results show faster learning with the presented approach as opposed to learning the control policy from scratch for this new UAV design created by modifications in a conventional quadcopter, i.e., the addition of more degrees of freedom (4-actuators in conventional quadcopter to 8-actuators in tilt-rotor quadcopter). We demonstrate the robustness of our learned policy by showing the recovery of the tilt-rotor platform in the simulation from various non-static initial conditions in order to reach a desired state. The developmental policy for the tilt-rotor UAV also showed superior fault tolerance when compared with the policy learned from the scratch. The results show the ability of the presented approach to bootstrap the learned behavior from a simpler system (lower-dimensional action-space) to a more complex robot (comparatively higher-dimensional action-space) and reach better performance faster.