论文标题
神经网络何时超过内核方法?
When Do Neural Networks Outperform Kernel Methods?
论文作者
论文摘要
对于随机梯度下降(SGD)初始化的一定缩放,已证明广泛的神经网络(NN)通过再现核Hilbert Space(RKHS)方法可以很好地近似。最近的经验工作表明,对于某些分类任务,RKHS方法可以替代NN,而不会大大损失。另一方面,已知两层NNS比RKHS编码更丰富的平滑度类别,我们知道由SGD训练的NN表现优于RKHS的特殊示例。即使在广泛的网络限制中,对于初始化的不同尺度也是如此。 我们如何调和上述主张?对于哪些任务,NNS的表现要优于RKHS?如果协变量几乎是各向同性的,则RKHS方法会受到维数的诅咒,而NNS可以通过学习最佳的低维度来克服它。在这里,我们表明,如果协变量显示与目标函数相同的低维结构,那么这种维度的诅咒就会变得更加温和,并且我们精确地表征了这种权衡。在这些结果的基础上,我们提出了尖刺的协变量模型,该模型可以在统一的框架中捕获,这在早期工作中观察到的这两种行为。 我们假设图像分类中存在这种潜在的低维结构。我们通过表明训练分布的特定扰动降解RKHS方法的性能要比NNS更明显地测试该假设。
For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance. On the other hand, two-layers NNs are known to encode richer smoothness classes than RKHS and we know of special examples for which SGD-trained NN provably outperform RKHS. This is true even in the wide network limit, for a different scaling of the initialization. How can we reconcile the above claims? For which tasks do NNs outperform RKHS? If covariates are nearly isotropic, RKHS methods suffer from the curse of dimensionality, while NNs can overcome it by learning the best low-dimensional representation. Here we show that this curse of dimensionality becomes milder if the covariates display the same low-dimensional structure as the target function, and we precisely characterize this tradeoff. Building on these results, we present the spiked covariates model that can capture in a unified framework both behaviors observed in earlier work. We hypothesize that such a latent low-dimensional structure is present in image classification. We test numerically this hypothesis by showing that specific perturbations of the training distribution degrade the performances of RKHS methods much more significantly than NNs.