具有有限样本收敛保证的动量Q学习

论文标题

具有有限样本收敛保证的动量Q学习

Momentum Q-learning with Finite-Sample Convergence Guarantee

论文作者

Weng, Bowen, Xiong, Huaqing, Zhao, Lin, Liang, Yingbin, Zhang, Wei

论文摘要

现有研究表明，传统优化中的动量思想可用于提高Q学习算法的性能。但是，基于动量的Q学习算法的有限样本分析仅适用于无函数近似的表格。本文分析了具有有限样本保证的一类基于动量的Q学习算法。具体而言，我们提出了势力UMQ算法，该算法集成了Nesterov和Polyak的动量方案，并概括了现有的基于动量的Q学习算法。对于无限的状态行动空间案例，我们建立了具有线性函数近似和马尔可夫采样的动量的收敛保证。特别是，我们表征了有限样本收敛速率，该收敛速率比香草Q学习的速度快。这是具有功能近似值的基于动量的Q学习算法的第一个有限样本分析。对于同步采样下的表格情况，我们还获得了一个有限样本收敛速率，该收敛速率比Speedyq \ citep {azar2011-speedy}略好，当选择一个特殊的台阶大小系列时。最后，我们通过各种实验证明，所提出的动量优于其他基于动量的Q学习算法。

Existing studies indicate that momentum ideas in conventional optimization can be used to improve the performance of Q-learning algorithms. However, the finite-sample analysis for momentum-based Q-learning algorithms is only available for the tabular case without function approximations. This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee. Specifically, we propose the MomentumQ algorithm, which integrates the Nesterov's and Polyak's momentum schemes, and generalizes the existing momentum-based Q-learning algorithms. For the infinite state-action space case, we establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling. In particular, we characterize the finite-sample convergence rate which is provably faster than the vanilla Q-learning. This is the first finite-sample analysis for momentum-based Q-learning algorithms with function approximations. For the tabular case under synchronous sampling, we also obtain a finite-sample convergence rate that is slightly better than the SpeedyQ \citep{azar2011speedy} when choosing a special family of step sizes. Finally, we demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题