论文标题

Nrowan-DQN:一个稳定的嘈杂网络,具有降噪和在线重量调整以进行探索

NROWAN-DQN: A Stable Noisy Network with Noise Reduction and Online Weight Adjustment for Exploration

论文作者

Han, Shuai, Zhou, Wenbo, Liu, Jing, Lü, Shuai

论文摘要

如今,深入的强化学习已被越来越广泛地应用,尤其是在各种复杂的控制任务中。对嘈杂网络的有效探索是深度强化学习中最重要的问题之一。嘈杂的网络倾向于为代理产生稳定的输出。但是,这种趋势并不总是足以为代理找到稳定的政策,从而降低了学习过程中的效率和稳定性。基于Noisynet,本文提出了一种称为Nrowan-DQN的算法,即减少降噪和在线重量调节Noisynet-DQN。首先,我们为Noisynet-DQN开发了一种新型的降噪方法,以使代理执行稳定的动作。其次,我们设计了降低降噪的在线重量调整策略,从而提高了稳定的性能并获得代理商的得分更高。最后,我们在四个标准域中评估了该算法,并分析了超参数的特性。我们的结果表明,Nrowan-DQN在所有这些域中都优于先前的算法。此外,Nrowan-DQN还显示出更好的稳定性。 Nrowan-DQN评分的方差大大降低,尤其是在某些动作敏感的环境中。这意味着,在某些需要高稳定性的环境中,Nrowan-DQN将比Noisynets-DQN更合适。

Deep reinforcement learning has been applied more and more widely nowadays, especially in various complex control tasks. Effective exploration for noisy networks is one of the most important issues in deep reinforcement learning. Noisy networks tend to produce stable outputs for agents. However, this tendency is not always enough to find a stable policy for an agent, which decreases efficiency and stability during the learning process. Based on NoisyNets, this paper proposes an algorithm called NROWAN-DQN, i.e., Noise Reduction and Online Weight Adjustment NoisyNet-DQN. Firstly, we develop a novel noise reduction method for NoisyNet-DQN to make the agent perform stable actions. Secondly, we design an online weight adjustment strategy for noise reduction, which improves stable performance and gets higher scores for the agent. Finally, we evaluate this algorithm in four standard domains and analyze properties of hyper-parameters. Our results show that NROWAN-DQN outperforms prior algorithms in all these domains. In addition, NROWAN-DQN also shows better stability. The variance of the NROWAN-DQN score is significantly reduced, especially in some action-sensitive environments. This means that in some environments where high stability is required, NROWAN-DQN will be more appropriate than NoisyNets-DQN.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源