梯度下降遵循一般损失的正规化路径

论文标题

梯度下降遵循一般损失的正规化路径

Gradient descent follows the regularization path for general losses

论文作者

Ji, Ziwei, Dudík, Miroslav, Schapire, Robert E., Telgarsky, Matus

论文摘要

许多机器学习学科的最新工作表明，即使没有明确的正则化，标准下降方法也不仅可以最大程度地减少训练错误，而且还表现出隐式偏见。这种偏见通常是针对某些正则化解决方案，并依赖于学习过程的细节，例如使用跨透明质损失。在这项工作中，我们表明，对于使用任意凸的线性预测变量的经验风险最小化，如果风险无法达到其额度，则严格降低损失，则梯度降低的路径和算法无关的正则化路径会收敛到同一方向（无论何时趋向于方向）。使用此结果，我们为广泛使用的指数级损失（例如指数损失或逻辑损失）提供了理由：虽然这种收敛到指数级损失方向的方向必然是指向最大减少损失方向，而其他损失（例如多项式尾部损失）可能会将转换带来较差的Margin方向转化。

Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an implicit bias. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the cross-entropy loss. In this work, we show that for empirical risk minimization over linear predictors with arbitrary convex, strictly decreasing losses, if the risk does not attain its infimum, then the gradient-descent path and the algorithm-independent regularization path converge to the same direction (whenever either converges to a direction). Using this result, we provide a justification for the widely-used exponentially-tailed losses (such as the exponential loss or the logistic loss): while this convergence to a direction for exponentially-tailed losses is necessarily to the maximum-margin direction, other losses such as polynomially-tailed losses may induce convergence to a direction with a poor margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题