与正方形损失的Relu网络中的隐式正规化

论文标题

与正方形损失的Relu网络中的隐式正规化

Implicit Regularization in ReLU Networks with the Square Loss

论文作者

Vardi, Gal, Shamir, Ohad

论文摘要

了解梯度下降的隐式正则化（或隐式偏见）最近是一个非常活跃的研究领域。但是，非线性神经网络中的隐式正则化仍然很少了解，尤其是对于诸如正方形损失之类的回归损失而言。也许令人惊讶的是，我们证明，即使对于单个Relu神经元，也不可能通过模型参数的任何明确函数来表征与平方损失的隐式正则化（尽管在正面，我们表明它可以大致表征它）。对于一个隐藏的网络，我们证明了一个类似的结果，通常不可能以这种方式表征隐式正则化属性，除了Du等人中确定的“平衡”属性。 [2018]。我们的结果表明，可能需要比到目前为止所考虑的更一般的框架来了解非线性预测因子的隐式正则化，并提供了有关该框架应该是什么的一些线索。

Understanding the implicit regularization (or implicit bias) of gradient descent has recently been a very active research area. However, the implicit regularization in nonlinear neural networks is still poorly understood, especially for regression losses such as the square loss. Perhaps surprisingly, we prove that even for a single ReLU neuron, it is impossible to characterize the implicit regularization with the square loss by any explicit function of the model parameters (although on the positive side, we show it can be characterized approximately). For one hidden-layer networks, we prove a similar result, where in general it is impossible to characterize implicit regularization properties in this manner, except for the "balancedness" property identified in Du et al. [2018]. Our results suggest that a more general framework than the one considered so far may be needed to understand implicit regularization for nonlinear predictors, and provides some clues on what this framework should be.

下载PDF全文

下载文献需遵守相关版权规定

论文标题