使用基于干扰的控制屏障功能，安全有效的加强学习

论文标题

使用基于干扰的控制屏障功能，安全有效的加强学习

Safe and Efficient Reinforcement Learning Using Disturbance-Observer-Based Control Barrier Functions

论文作者

Cheng, Yikun, Zhao, Pan, Hovakimyan, Naira

论文摘要

在培训期间，安全的加强学习（RL）在训练过程中对硬状态限制的满意度最近受到了很多关注。安全过滤器，例如，基于控制屏障功能（CBF），通过修改即时修改RL代理的不安全动作提供了一种有希望的方法。现有的基于安全过滤器的方法通常涉及学习不确定的动态并量化学习的模型误差，这会导致保守的过滤器，然后收集大量数据以学习一个良好的模型，从而防止有效的探索。本文提出了一种使用干扰观察者（DOB）和控制屏障功能（CBF）的安全有效RL的方法。与大多数处理硬状态约束的现有安全RL方法不同，我们的方法不涉及模型学习，并且利用DOBS准确估算不确定性的点值，然后将其纳入强大的CBF条件中以生成安全的操作。基于DOB的CBF可以通过最小修改RL代理的作用，以确保在整个学习过程中确保安全性，从而将基于DOB的CBF用作具有无模型RL算法的安全过滤器。仿真导致独轮车和2D四键的结果表明，使用CBF和基于高斯流程的模型学习，在安全性违规率以及样本和计算效率方面，所提出的方法优于最先进的安全RL算法。

Safe reinforcement learning (RL) with assured satisfaction of hard state constraints during training has recently received a lot of attention. Safety filters, e.g., based on control barrier functions (CBFs), provide a promising way for safe RL via modifying the unsafe actions of an RL agent on the fly. Existing safety filter-based approaches typically involve learning of uncertain dynamics and quantifying the learned model error, which leads to conservative filters before a large amount of data is collected to learn a good model, thereby preventing efficient exploration. This paper presents a method for safe and efficient RL using disturbance observers (DOBs) and control barrier functions (CBFs). Unlike most existing safe RL methods that deal with hard state constraints, our method does not involve model learning, and leverages DOBs to accurately estimate the pointwise value of the uncertainty, which is then incorporated into a robust CBF condition to generate safe actions. The DOB-based CBF can be used as a safety filter with model-free RL algorithms by minimally modifying the actions of an RL agent whenever necessary to ensure safety throughout the learning process. Simulation results on a unicycle and a 2D quadrotor demonstrate that the proposed method outperforms a state-of-the-art safe RL algorithm using CBFs and Gaussian processes-based model learning, in terms of safety violation rate, and sample and computational efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题