论文标题
避免在深度二次网络中伪造的本地最小值
Avoiding Spurious Local Minima in Deep Quadratic Networks
论文作者
论文摘要
尽管他们的实践成功,但由于这种模型的高维,非凸面和高度非线性结构,对神经网络损失格局的理论理解已被证明具有挑战性。在本文中,我们表征具有二次激活功能的神经网络的平方误差损失的训练格局。当神经元的数量大于或等于输入维度,并且将训练样本的范围用作回归器时,我们证明存在杂种局部最小值和马鞍点,可以轻松地用概率逃脱。我们证明,具有二次激活的深度参数化神经网络受益于类似的良好景观特性。我们的理论结果与数据分布无关,并填补了两层二次神经网络理论上现有的空白。最后,我们从经验上证明了这些问题的融合到全球最低限度。
Despite their practical success, a theoretical understanding of the loss landscape of neural networks has proven challenging due to the high-dimensional, non-convex, and highly nonlinear structure of such models. In this paper, we characterize the training landscape of the mean squared error loss for neural networks with quadratic activation functions. We prove existence of spurious local minima and saddle points which can be escaped easily with probability one when the number of neurons is greater than or equal to the input dimension and the norm of the training samples is used as a regressor. We prove that deep overparameterized neural networks with quadratic activations benefit from similar nice landscape properties. Our theoretical results are independent of data distribution and fill the existing gap in theory for two-layer quadratic neural networks. Finally, we empirically demonstrate convergence to a global minimum for these problems.