连续控制的深度径向基础值函数

论文标题

连续控制的深度径向基础值函数

Deep Radial-Basis Value Functions for Continuous Control

论文作者

Asadi, Kavosh, Parikh, Neev, Parr, Ronald E., Konidaris, George D., Littman, Michael L.

论文摘要

增强学习（RL）的核心操作是找到有关学习价值函数最佳的动作。当学习价值函数将连续的动作作为输入时，此操作通常会具有挑战性。我们引入了深度径向基本值函数（RBVFS）：使用具有径向基础函数（RBF）输出层的深网所学习的值函数。我们表明，相对于深RBVF的最大动作值可以轻松，准确地近似。此外，Deep RBVF可以代表任何真实值函数，因为它们对通用函数近似的支持。我们将标准DQN算法扩展到连续控制，通过将代理带有深入的RBVF。我们表明，所得代理称为RBF-DQN，显着优于仅值函数函数的基准，并且与最先进的Actor-Critic-Critic算法具有竞争力。

A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned value function. This operation is often challenging when the learned value function takes continuous actions as input. We introduce deep radial-basis value functions (RBVFs): value functions learned using a deep network with a radial-basis function (RBF) output layer. We show that the maximum action-value with respect to a deep RBVF can be approximated easily and accurately. Moreover, deep RBVFs can represent any true value function owing to their support for universal function approximation. We extend the standard DQN algorithm to continuous control by endowing the agent with a deep RBVF. We show that the resultant agent, called RBF-DQN, significantly outperforms value-function-only baselines, and is competitive with state-of-the-art actor-critic algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题