论文标题
内核信息瓶颈会导致深层网络中的生物学上的三因素Hebbian学习
Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks
论文作者
论文摘要
对培训深神经网络的最先进的机器学习方法,反向传播,对于真实的神经网络来说是不可信的:神经元需要知道其外向的体重;训练在自下而上的前向通行证(计算)和自上而下的向后通行证(学习)之间;而且该算法通常需要许多数据点的精确标签。在生物学上与反向传播(例如反馈对齐)的合理近似可以解决重量传输问题,但不能解决其他两个问题。因此,到目前为止,完全具有生物学上合理的学习规则仍然难以捉摸。在这里,我们提出了一个学习规则的家庭,这些规则不会遭受这些问题的困扰。它是由信息瓶颈原理(用内核方法扩展)的动机,在该原理中,网络学会在不牺牲输出预测的情况下尽可能地压缩输入。由此产生的规则具有三因素HEBBIAN结构:它们需要前后突触后的触发率和错误信号 - 第三个因素 - 由全局教学信号和特定层的特定术语组成,两者都没有上下通行证。他们不需要精确的标签;相反,它们依赖于所需输出对之间的相似性。此外,要在硬性问题上获得良好的表现并保留生物学的合理性,我们的规则需要分裂的标准化 - 生物网络的已知特征。最后,模拟表明,我们的规则几乎和图像分类任务上的反向传播一样。
The state-of-the art machine learning approach to training deep neural networks, backpropagation, is implausible for real neural networks: neurons need to know their outgoing weights; training alternates between a bottom-up forward pass (computation) and a top-down backward pass (learning); and the algorithm often needs precise labels of many data points. Biologically plausible approximations to backpropagation, such as feedback alignment, solve the weight transport problem, but not the other two. Thus, fully biologically plausible learning rules have so far remained elusive. Here we present a family of learning rules that does not suffer from any of these problems. It is motivated by the information bottleneck principle (extended with kernel methods), in which networks learn to compress the input as much as possible without sacrificing prediction of the output. The resulting rules have a 3-factor Hebbian structure: they require pre- and post-synaptic firing rates and an error signal - the third factor - consisting of a global teaching signal and a layer-specific term, both available without a top-down pass. They do not require precise labels; instead, they rely on the similarity between pairs of desired outputs. Moreover, to obtain good performance on hard problems and retain biological plausibility, our rules need divisive normalization - a known feature of biological networks. Finally, simulations show that our rules perform nearly as well as backpropagation on image classification tasks.