论文标题

连续环境中的时间分化学习

Temporal-Differential Learning in Continuous Environments

论文作者

Bian, Tao, Jiang, Zhong-Ping

论文摘要

在本文中,引入了一种新的增强学习(RL)方法,称为时间差异方法。与传统的时间差异学习方法相比,它在为连续环境开发新型的RL技术中起着至关重要的作用。特别是,开发了连续的最小二乘策略评估(CT-LSPE)和连续的时间差异(CT-TD)学习方法。提供了理论和经验证据,以证明所提出的时间差异学习方法的有效性。

In this paper, a new reinforcement learning (RL) method known as the method of temporal differential is introduced. Compared to the traditional temporal-difference learning method, it plays a crucial role in developing novel RL techniques for continuous environments. In particular, the continuous-time least squares policy evaluation (CT-LSPE) and the continuous-time temporal-differential (CT-TD) learning methods are developed. Both theoretical and empirical evidences are provided to demonstrate the effectiveness of the proposed temporal-differential learning methodology.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源