通过目标距离梯度进行强化学习

论文标题

通过目标距离梯度进行强化学习

Reinforcement Learning with Goal-Distance Gradient

论文作者

Jiang, Kai, Qin, XiaoLong

论文摘要

加强学习通常会使用环境的反馈奖励来训练代理。但是实际环境中的奖励很少，甚至某些环境也不会奖励。当前的大多数方法都难以在稀疏奖励或非回报环境中获得良好的性能。尽管在解决稀疏奖励任务时使用形状奖励是有效的，但它仅限于特定问题，并且学习也容易受到本地Optima的影响。我们提出了一种无模型方法，该方法不依赖环境奖励来解决一般环境中稀疏奖励的问题。我们的方法使用州之间的最小过渡数量作为代替环境奖励的距离，并提出了一个目标距离梯度以实现政策的改进。我们还基于提高勘探效率的方法的特征，引入了一种桥点计划方法，从而解决了更复杂的任务。实验表明，在复杂环境中，我们的方法在稀疏奖励和本地最佳问题上的表现要比以前的工作更好。

Reinforcement learning usually uses the feedback rewards of environmental to train agents. But the rewards in the actual environment are sparse, and even some environments will not rewards. Most of the current methods are difficult to get good performance in sparse reward or non-reward environments. Although using shaped rewards is effective when solving sparse reward tasks, it is limited to specific problems and learning is also susceptible to local optima. We propose a model-free method that does not rely on environmental rewards to solve the problem of sparse rewards in the general environment. Our method use the minimum number of transitions between states as the distance to replace the rewards of environmental, and proposes a goal-distance gradient to achieve policy improvement. We also introduce a bridge point planning method based on the characteristics of our method to improve exploration efficiency, thereby solving more complex tasks. Experiments show that our method performs better on sparse reward and local optimal problems in complex environments than previous work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题