训练线性神经网络中隐性偏见的统一观点

论文标题

训练线性神经网络中隐性偏见的统一观点

A Unifying View on Implicit Bias in Training Linear Neural Networks

论文作者

Yun, Chulhee, Krishnan, Shankar, Mobahi, Hossein

论文摘要

我们研究线性神经网络训练中梯度流的隐式偏差（即具有无限步长的梯度下降）。我们提出了一种神经网络的张量公式，其中包括完全连接，对角线和卷积网络作为特殊情况，并研究称为线性张量网络的线性版本。通过这种公式，我们可以将网络参数的收敛方向表征为网络定义的张量的单数向量。对于正交分解的$ L $ - 层线性张量网络，我们表明可分离分类的梯度流可以找到$ \ ell_ {2/l} $ max-margin问题的固定点，该网络定义了由网络定义的“转换”输入空间。对于不确定的回归，我们证明梯度流找到了一个全局最小值，该全局最小值最小化了类似标准的函数，该功能在转换后的输入空间中的加权$ \ ell_1 $和$ \ ell_2 $ norms之间进行了插值。我们的定理将现有的现有结果归为文献，同时删除了标准收敛假设。我们还提供了证实我们分析的实验。

We study the implicit bias of gradient flow (i.e., gradient descent with infinitesimal step size) on linear neural network training. We propose a tensor formulation of neural networks that includes fully-connected, diagonal, and convolutional networks as special cases, and investigate the linear version of the formulation called linear tensor networks. With this formulation, we can characterize the convergence direction of the network parameters as singular vectors of a tensor defined by the network. For $L$-layer linear tensor networks that are orthogonally decomposable, we show that gradient flow on separable classification finds a stationary point of the $\ell_{2/L}$ max-margin problem in a "transformed" input space defined by the network. For underdetermined regression, we prove that gradient flow finds a global minimum which minimizes a norm-like function that interpolates between weighted $\ell_1$ and $\ell_2$ norms in the transformed input space. Our theorems subsume existing results in the literature while removing standard convergence assumptions. We also provide experiments that corroborate our analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题