论文标题
在两个神经网络和学习的稳定性之间
On the distance between two neural networks and the stability of learning
论文作者
论文摘要
本文将参数距离与一类广泛的非线性组成函数的梯度分解相关联。该分析导致了一种新的距离函数,称为“深度相对信任”和神经网络的下降引理。由于所得的学习规则似乎几乎不需要学习率调整,因此它可能会解锁更简单的工作流程,以训练更深入,更复杂的神经网络。本文中使用的Python代码在这里:https://github.com/jxbz/fromage。
This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions. The analysis leads to a new distance function called deep relative trust and a descent lemma for neural networks. Since the resulting learning rule seems to require little to no learning rate tuning, it may unlock a simpler workflow for training deeper and more complex neural networks. The Python code used in this paper is here: https://github.com/jxbz/fromage.