论文标题
梯度下降的隐式偏差在重新训练模型上:等效到镜像下降
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
论文作者
论文摘要
作为理解过度散热模型中梯度下降的隐式偏差的努力的一部分,几个结果表明,如何将过份术模型上的训练轨迹理解为不同目标上的镜像下降。这里的主要结果是在称为通勤参数化的概念下对这种现象的表征,该概念涵盖了此设置的所有先前结果。结果表明,具有任何通勤参数化的梯度流相当于具有相关Legendre函数的连续镜下降。相反,具有任何legendre函数的连续镜像可以被视为具有相关通勤参数化的梯度流。后一个结果依赖于纳什的嵌入定理。
As part of the effort to understand implicit bias of gradient descent in overparametrized models, several results have shown how the training trajectory on the overparametrized model can be understood as mirror descent on a different objective. The main result here is a characterization of this phenomenon under a notion termed commuting parametrization, which encompasses all the previous results in this setting. It is shown that gradient flow with any commuting parametrization is equivalent to continuous mirror descent with a related Legendre function. Conversely, continuous mirror descent with any Legendre function can be viewed as gradient flow with a related commuting parametrization. The latter result relies upon Nash's embedding theorem.