地图繁殖算法：与一组增强剂学习代理团队更快的学习速度

论文标题

地图繁殖算法：与一组增强剂学习代理团队更快的学习速度

MAP Propagation Algorithm: Faster Learning with a Team of Reinforcement Learning Agents

论文作者

Chung, Stephen

论文摘要

几乎所有最先进的深度学习算法都依赖于错误反向传播，这通常被认为是生物学上难以置信的。训练人工神经网络的另一种方法是将网络中的每个单元视为强化学习代理，因此该网络被视为代理团队。因此，所有单元均可通过增强训练，这是一个由全球信号调节的局部学习规则，与生物学上观察到的突触可塑性形式更一致。尽管该学习规则遵循了预期回报的梯度，但它具有很高的差异，因此学习速度很低，因此训练深层网络不切实际。因此，我们提出了一种称为地图传播的新型算法，以大大减少这种差异，同时保留学习规则的局部特性。实验表明，当应用于参与者批评网络时，地图传播可以以类似的速度解决常见的增强学习任务。因此，我们的工作允许在深度强化学习中更广泛地应用代理团队。

Nearly all state-of-the-art deep learning algorithms rely on error backpropagation, which is generally regarded as biologically implausible. An alternative way of training an artificial neural network is through treating each unit in the network as a reinforcement learning agent, and thus the network is considered as a team of agents. As such, all units can be trained by REINFORCE, a local learning rule modulated by a global signal that is more consistent with biologically observed forms of synaptic plasticity. Although this learning rule follows the gradient of return in expectation, it suffers from high variance and thus the low speed of learning, rendering it impractical to train deep networks. We therefore propose a novel algorithm called MAP propagation to reduce this variance significantly while retaining the local property of the learning rule. Experiments demonstrated that MAP propagation could solve common reinforcement learning tasks at a similar speed to backpropagation when applied to an actor-critic network. Our work thus allows for the broader application of the teams of agents in deep reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题