论文标题
正在进行的工作:时间扩展辅助任务
Work in Progress: Temporally Extended Auxiliary Tasks
论文作者
论文摘要
已经证明,预测辅助任务可以改善众多强化学习工作中的性能,但是,这种效果仍未得到充分理解。这里介绍的工作的主要目的是调查辅助任务的预测时间表对代理商政策绩效的影响。我们考虑辅助任务,这些任务学会使用时间差异学习来做出政策预测。我们使用特定形式的辅助任务测试预测时间尺度的影响,其中输入图像用作预测目标,我们称之为时间差自动编码器(TD-AE)。我们利用不同的预测时间表来评估TD-AE对Vizdoom环境中A2C算法的影响。 While we do not observe a clear relationship between the prediction timescale on performance, we make the following observations: 1) using auxiliary tasks allows us to reduce the trajectory length of the A2C algorithm, 2) in some cases temporally extended TD-AE performs better than a straight autoencoder, 3) performance with auxiliary tasks is sensitive to the weight placed on the auxiliary loss, 4) despite this sensitivity, auxiliary任务改善了性能,而没有大量的超参数调整。我们的总体结论是,TD-AE将A2C算法的鲁棒性提高到轨迹长度,并且在有希望的同时,需要进一步的研究以充分了解辅助任务预测时间表与代理商的性能之间的关系。
Predictive auxiliary tasks have been shown to improve performance in numerous reinforcement learning works, however, this effect is still not well understood. The primary purpose of the work presented here is to investigate the impact that an auxiliary task's prediction timescale has on the agent's policy performance. We consider auxiliary tasks which learn to make on-policy predictions using temporal difference learning. We test the impact of prediction timescale using a specific form of auxiliary task in which the input image is used as the prediction target, which we refer to as temporal difference autoencoders (TD-AE). We empirically evaluate the effect of TD-AE on the A2C algorithm in the VizDoom environment using different prediction timescales. While we do not observe a clear relationship between the prediction timescale on performance, we make the following observations: 1) using auxiliary tasks allows us to reduce the trajectory length of the A2C algorithm, 2) in some cases temporally extended TD-AE performs better than a straight autoencoder, 3) performance with auxiliary tasks is sensitive to the weight placed on the auxiliary loss, 4) despite this sensitivity, auxiliary tasks improved performance without extensive hyper-parameter tuning. Our overall conclusions are that TD-AE increases the robustness of the A2C algorithm to the trajectory length and while promising, further study is required to fully understand the relationship between auxiliary task prediction timescale and the agent's performance.