论文标题
批次归一化对损失的第一个和第二个导数视而不见
Batch Normalization Is Blind to the First and Second Derivatives of the Loss
论文作者
论文摘要
在本文中,我们证明了BN操作对损失的第一和第二个衍生物的后传播的影响。当我们执行损失函数的泰勒级数扩展时,我们证明BN操作将阻止损失二阶项的一阶项和最大影响的影响。我们还发现,这样的问题是由BN操作的标准化阶段引起的。实验结果已经验证了我们的理论结论,我们发现BN的运行显着影响特定任务中的特征表示,其中不同样本的损失具有相似的分析公式。
In this paper, we prove the effects of the BN operation on the back-propagation of the first and second derivatives of the loss. When we do the Taylor series expansion of the loss function, we prove that the BN operation will block the influence of the first-order term and most influence of the second-order term of the loss. We also find that such a problem is caused by the standardization phase of the BN operation. Experimental results have verified our theoretical conclusions, and we have found that the BN operation significantly affects feature representations in specific tasks, where losses of different samples share similar analytic formulas.