增强学习中的梯度动态

论文标题

增强学习中的梯度动态

Gradient dynamics in reinforcement learning

论文作者

Fabbricatore, Riccardo, Palyulin, Vladimir V.

论文摘要

尽管在统计力学框架中对监督学习算法的分析取得了成功，但强化学习仍然很大程度上没有受到影响。在这里，我们通过分析策略梯度算法的动态来缩小差距。对于凸问题，我们表明它以学习率调节的系数遵守漂移扩散运动。此外，我们提出了非凸强加固学习问题与无序系统之间的映射。该映射使我们能够展示学习率如何充当有效的温度，从而能够平滑粗糙的景观，证实了通过散布 - 排除描述所显示的内容，并为基于无序系统的退火过程而基于物理启发的算法优化铺平了道路。

Despite the success achieved by the analysis of supervised learning algorithms in the framework of statistical mechanics, reinforcement learning has remained largely untouched. Here we move towards closing the gap by analyzing the dynamics of the policy gradient algorithm. For a convex problem, we show that it obeys a drift-diffusion motion with coeffcients tuned by learning rate. Furthermore, we propose a mapping between a non-convex reinforcement learning problem and a disordered system. This mapping enables us to show how the learning rate acts as an effective temperature and thus is capable of smoothing rough landscapes, corroborating what is displayed by the drift-diffusive description and paving the way for physics-inspired algorithmic optimization based on annealing procedures in disordered systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题