为什么深层剩余网络比深馈电网络更好地概括？ - 神经切线内核的观点

论文标题

为什么深层剩余网络比深馈电网络更好地概括？ - 神经切线内核的观点

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? -- A Neural Tangent Kernel Perspective

论文作者

Huang, Kaixuan, Wang, Yuqing, Tao, Molei, Zhao, Tuo

论文摘要

与深度进发电片网络（FFNETS）相比，深层剩余网络（RESNET）表现出更好的概括性能。但是，这种现象背后的理论仍然很大程度上未知。本文从所谓的“神经切线内核”角度研究了深入学习的基本问题。具体而言，我们首先表明，在适当的条件下，随着宽度为无穷大，训练深度重新板可以看作是学习具有某些内核函数的核函数的学习。然后，我们将深度重新集的内核与深FFNET的内核进行比较，发现由FFNET的内核引起的函数类别在渐近上是不可学习的，因为深度是无穷大的。相比之下，由ResNet的核引起的功能类别不会表现出这种脱落性。我们的发现部分证明了深度重新设置比深度FFNET的优势合理。提供数值结果以支持我们的主张。

Deep residual networks (ResNets) have demonstrated better generalization performance than deep feedforward networks (FFNets). However, the theory behind such a phenomenon is still largely unknown. This paper studies this fundamental problem in deep learning from a so-called "neural tangent kernel" perspective. Specifically, we first show that under proper conditions, as the width goes to infinity, training deep ResNets can be viewed as learning reproducing kernel functions with some kernel function. We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity. In contrast, the class of functions induced by the kernel of ResNets does not exhibit such degeneracy. Our discovery partially justifies the advantages of deep ResNets over deep FFNets in generalization abilities. Numerical results are provided to support our claim.

下载PDF全文

下载文献需遵守相关版权规定

论文标题