基于图的状态表示深度强化学习

论文标题

基于图的状态表示深度强化学习

Graph-based State Representation for Deep Reinforcement Learning

论文作者

Waradpande, Vikram, Kudenko, Daniel, Khosla, Megha

论文摘要

深度RL方法在深度神经网络产生有用的内部表示的能力上取得了很大的成功。然而，他们患有较高的样本复杂性，从良好的输入表示开始可能会对性能产生重大影响。在本文中，我们利用了这样一个事实，即基础马尔可夫决策过程（MDP）代表一个图，这使我们能够将拓扑信息纳入有效的状态表示学习。由节点表示对几个图形分析任务的最新成功的激励，我们专门研究了节点表示方法的能力，以有效地编码Deep RL中基础MDP的拓扑。为此，我们对从4种不同类别的表示算法中选择的几种模型进行了比较分析，用于在网格世界导航任务中进行策略学习，这些算法代表了大量RL问题。我们发现，在所有研究的情况下，所有嵌入方法的表现都优于网格世界环境的常用矩阵表示。更糟糕的是，基于图卷积的方法的表现优于简单的基于步行的方法和图形线性自动编码器。

Deep RL approaches build much of their success on the ability of the deep neural network to generate useful internal representations. Nevertheless, they suffer from a high sample-complexity and starting with a good input representation can have a significant impact on the performance. In this paper, we exploit the fact that the underlying Markov decision process (MDP) represents a graph, which enables us to incorporate the topological information for effective state representation learning. Motivated by the recent success of node representations for several graph analytical tasks we specifically investigate the capability of node representation learning methods to effectively encode the topology of the underlying MDP in Deep RL. To this end we perform a comparative analysis of several models chosen from 4 different classes of representation learning algorithms for policy learning in grid-world navigation tasks, which are representative of a large class of RL problems. We find that all embedding methods outperform the commonly used matrix representation of grid-world environments in all of the studied cases. Moreoever, graph convolution based methods are outperformed by simpler random walk based methods and graph linear autoencoders.

下载PDF全文

下载文献需遵守相关版权规定

论文标题