对部分可观察到的MDP的深度主动推断

论文标题

对部分可观察到的MDP的深度主动推断

Deep Active Inference for Partially Observable MDPs

论文作者

van der Himst, Otto, Lanillos, Pablo

论文摘要

已经提出了深入的主动推论，是一种可扩展的感知和行动方法，涉及大型政策和国家空间。但是，当前模型仅限于完全可观察到的域。在本文中，我们描述了一个深层的活跃推理模型，该模型可以直接从高维感觉输入中学习成功的策略。深度学习体系结构优化了预期自由能的变体，并通过变异自动编码器来编码连续状态表示。我们在OpenAI基准中表明，我们的方法具有可比性或更好的性能，而Deep Q-Learning是一种最先进的深度强化学习算法。

Deep active inference has been proposed as a scalable approach to perception and action that deals with large policy and state spaces. However, current models are limited to fully observable domains. In this paper, we describe a deep active inference model that can learn successful policies directly from high-dimensional sensory inputs. The deep learning architecture optimizes a variant of the expected free energy and encodes the continuous state representation by means of a variational autoencoder. We show, in the OpenAI benchmark, that our approach has comparable or better performance than deep Q-learning, a state-of-the-art deep reinforcement learning algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题