通过广义下限Q学习学习的自我象征学习

论文标题

通过广义下限Q学习学习的自我象征学习

Self-Imitation Learning via Generalized Lower Bound Q-learning

论文作者

Tang, Yunhao

论文摘要

由较低的Q学习动机动机的自我象征学习是一种新颖有效的非政策学习方法。在这项工作中，我们提出了一个N步长下限，该n步骤概括了原始的基于返回的下限Q学习，并引入了新的自我图像学习算法系列。为了为自我象征学习提供的潜在性能增长提供正式的动机，我们表明N-步骤下限Q学习实现了固定点偏差和收缩率之间的权衡，与流行的未经校正的N-Step Q-Learning建立了密切的联系。我们最终表明，在各种连续的控制基准任务中，N步长下限Q学习是返回基于返回的自我象征学习和未经校正的N步长的更强大替代方案。

Self-imitation learning motivated by lower-bound Q-learning is a novel and effective approach for off-policy learning. In this work, we propose a n-step lower bound which generalizes the original return-based lower-bound Q-learning, and introduce a new family of self-imitation learning algorithms. To provide a formal motivation for the potential performance gains provided by self-imitation learning, we show that n-step lower bound Q-learning achieves a trade-off between fixed point bias and contraction rate, drawing close connections to the popular uncorrected n-step Q-learning. We finally show that n-step lower bound Q-learning is a more robust alternative to return-based self-imitation learning and uncorrected n-step, over a wide range of continuous control benchmark tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题