在存在干扰的存在下，用于加固学习的安全探索方法

论文标题

在存在干扰的存在下，用于加固学习的安全探索方法

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

论文作者

Okawa, Yoshihiro, Sasaki, Tomotake, Yanami, Hitoshi, Namerikawa, Toru

论文摘要

增强学习算法的最新发展使我们在许多领域为我们提供了新的可能性。但是，由于它们的探索财产，当我们将这些算法应用于关键问题问题，尤其是在实际环境中时，我们必须考虑风险。在这项研究中，我们处理了在骚乱的存在下进行增强学习的安全探索问题。我们将学习过程定义为对约束条件的满意度明确定义，并提出了一种安全的勘探方法，该方法使用受控物体和干扰的部分先验知识。所提出的方法确保即使受控物体在正态分布后暴露于随机干扰中，即使受控物体暴露于正态分解的概率，也可以满足显式状态约束。作为理论上的结果，我们引入了足够的条件来构建不包含该方法中使用的探索方面的保守输入，并证明了上述解释意义上的安全性可以通过提出的方法保证。此外，我们通过对倒置的倒置和四杆平行链路机器人操纵器的数值模拟来说明所提出方法的有效性和有效性。

Recent rapid developments in reinforcement learning algorithms have been giving us novel possibilities in many fields. However, due to their exploring property, we have to take the risk into consideration when we apply those algorithms to safety-critical problems especially in real environments. In this study, we deal with a safe exploration problem in reinforcement learning under the existence of disturbance. We define the safety during learning as satisfaction of the constraint conditions explicitly defined in terms of the state and propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance. The proposed method assures the satisfaction of the explicit state constraints with a pre-specified probability even if the controlled object is exposed to a stochastic disturbance following a normal distribution. As theoretical results, we introduce sufficient conditions to construct conservative inputs not containing an exploring aspect used in the proposed method and prove that the safety in the above explained sense is guaranteed with the proposed method. Furthermore, we illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.

下载PDF全文

下载文献需遵守相关版权规定

论文标题