在神经切线内核下平均随机梯度下降的最佳速率

论文标题

在神经切线内核下平均随机梯度下降的最佳速率

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

论文作者

Nitanda, Atsushi, Suzuki, Taiji

论文摘要

我们分析了平均随机梯度下降的收敛性，用于过度参数化的两层神经网络，以解决回归问题。最近发现，神经切线内核（NTK）在显示NTK制度下基于梯度的方法的全球收敛方面起着重要作用，在NTK制度下，过度参数化神经网络的学习动力学几乎可以由相关的再现kernel Hilbert Space（RKHS）来表征。但是，在NTK制度中，仍然有收敛率分析的空间。在这项研究中，我们表明，平均随机梯度下降可以通过利用目标函数的复杂性和与NTK相关的RKHS来实现最小值最佳收敛率，并具有全球收敛保证。此外，我们表明，可以通过在某些条件下的Relu网络的平滑近似来以最佳的收敛速率来学习Relu网络NTK指定的目标函数。

We analyze the convergence of the averaged stochastic gradient descent for overparameterized two-layer neural networks for regression problems. It was recently found that a neural tangent kernel (NTK) plays an important role in showing the global convergence of gradient-based methods under the NTK regime, where the learning dynamics for overparameterized neural networks can be almost characterized by that for the associated reproducing kernel Hilbert space (RKHS). However, there is still room for a convergence rate analysis in the NTK regime. In this study, we show that the averaged stochastic gradient descent can achieve the minimax optimal convergence rate, with the global convergence guarantee, by exploiting the complexities of the target function and the RKHS associated with the NTK. Moreover, we show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate through a smooth approximation of a ReLU network under certain conditions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题