通过局部神经调节剂通过任意的时间盆地通过任意时间盆地的生物学上可行的反向传播

论文标题

通过局部神经调节剂通过任意的时间盆地通过任意时间盆地的生物学上可行的反向传播

Biologically-plausible backpropagation through arbitrary timespans via local neuromodulators

论文作者

Liu, Yuhan Helena, Smith, Stephen, Mihalas, Stefan, Shea-Brown, Eric, Sümbül, Uygar

论文摘要

复发性神经网络模型的壮观成功，其中通过基于反向传播的梯度下降调整了关键参数，这激发了人们对生物神经元网络如何解决相应的突触信用分配问题的很多思考。但是，迄今为止，关于生物网络如何在时间上实施必要的反向传播的共识很少，鉴于生物突触网络信号体系结构的限制得到了广泛认可的限制。在这里，我们提出，局部神经调节剂（如神经肽）的突触外扩散可能会带来有效的反向传播模式，位于生物合理性的范围内。除了现有基于时间截断的梯度近似值之外，我们的基于梯度的更新规则ModProp通过任意时间步骤传播信用信息。 ModProp提出，调节信号可以通过因果关系，时间不变和突触特异性的特定滤波TAPS来卷积其资格痕迹来对接收细胞作用。我们对ModProp学习的数学分析，以及基准时间任务的模拟结果，证明了ModProp比现有生物学上可行的时间信用分配规则的优势。这些结果表明，潜在的神经元机制，用于在更长的时间范围内发出与复发相互作用有关的信用信息。最后，我们得出了ModProp的核内实施，可以作为时间的低复杂性和因果关系替代品。

The spectacular successes of recurrent neural network models where key parameters are adjusted via backpropagation-based gradient descent have inspired much thought as to how biological neuronal networks might solve the corresponding synaptic credit assignment problem. There is so far little agreement, however, as to how biological networks could implement the necessary backpropagation through time, given widely recognized constraints of biological synaptic network signaling architectures. Here, we propose that extra-synaptic diffusion of local neuromodulators such as neuropeptides may afford an effective mode of backpropagation lying within the bounds of biological plausibility. Going beyond existing temporal truncation-based gradient approximations, our approximate gradient-based update rule, ModProp, propagates credit information through arbitrary time steps. ModProp suggests that modulatory signals can act on receiving cells by convolving their eligibility traces via causal, time-invariant and synapse-type-specific filter taps. Our mathematical analysis of ModProp learning, together with simulation results on benchmark temporal tasks, demonstrate the advantage of ModProp over existing biologically-plausible temporal credit assignment rules. These results suggest a potential neuronal mechanism for signaling credit information related to recurrent interactions over a longer time horizon. Finally, we derive an in-silico implementation of ModProp that could serve as a low-complexity and causal alternative to backpropagation through time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题