论文标题
具有有限样本收敛保证的动量Q学习
Momentum Q-learning with Finite-Sample Convergence Guarantee
论文作者
论文摘要
现有研究表明,传统优化中的动量思想可用于提高Q学习算法的性能。但是,基于动量的Q学习算法的有限样本分析仅适用于无函数近似的表格。本文分析了具有有限样本保证的一类基于动量的Q学习算法。具体而言,我们提出了势力UMQ算法,该算法集成了Nesterov和Polyak的动量方案,并概括了现有的基于动量的Q学习算法。对于无限的状态行动空间案例,我们建立了具有线性函数近似和马尔可夫采样的动量的收敛保证。特别是,我们表征了有限样本收敛速率,该收敛速率比香草Q学习的速度快。这是具有功能近似值的基于动量的Q学习算法的第一个有限样本分析。对于同步采样下的表格情况,我们还获得了一个有限样本收敛速率,该收敛速率比Speedyq \ citep {azar2011-speedy}略好,当选择一个特殊的台阶大小系列时。最后,我们通过各种实验证明,所提出的动量优于其他基于动量的Q学习算法。
Existing studies indicate that momentum ideas in conventional optimization can be used to improve the performance of Q-learning algorithms. However, the finite-sample analysis for momentum-based Q-learning algorithms is only available for the tabular case without function approximations. This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee. Specifically, we propose the MomentumQ algorithm, which integrates the Nesterov's and Polyak's momentum schemes, and generalizes the existing momentum-based Q-learning algorithms. For the infinite state-action space case, we establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling. In particular, we characterize the finite-sample convergence rate which is provably faster than the vanilla Q-learning. This is the first finite-sample analysis for momentum-based Q-learning algorithms with function approximations. For the tabular case under synchronous sampling, we also obtain a finite-sample convergence rate that is slightly better than the SpeedyQ \citep{azar2011speedy} when choosing a special family of step sizes. Finally, we demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.