论文标题

BN增加神经网络过滤器如何增加?

How Does BN Increase Collapsed Neural Network Filters?

论文作者

Zhou, Sheng, Wang, Xinjiang, Luo, Ping, Feng, Litong, Li, Wenjie, Zhang, Wei

论文摘要

改善深神经网络(DNN)的稀疏性对于网络压缩至关重要,引起了很多关注。在这项工作中,我们披露了一个称为滤波的有害稀疏过程,该过程在具有批处理归一化(BN)和整流线性激活功能的DNN中很常见(例如,relu,泄漏的relu)。即使没有明确的稀疏性诱导正规化(例如$ l_1 $),也会发生。这种现象是由BN的归一化效应引起的,BN在参数空间中诱导了不可训练的区域并减少网络容量。当对网络经过较大的学习率(LR)或自适应LR调度程序以及网络进行训练时,这种现象变得更加突出。我们在分析上证明,在具有高梯度噪声的SGD更新过程中,BN的参数倾向于变得更稀疏,并且稀疏概率与学习率的平方成正比,并且与BN尺度参数的平方成反比。为了防止不良折叠过滤器,我们提出了一种简单而有效的方法,名为换档后的BN(PSBN),该方法具有与BN相同的表示能力,同时能够在训练过程中自动使BN参数自动使BN参数再次训练。使用PSBN,我们可以恢复崩溃的过滤器,并增加各种任务中的模型性能,例如CIFAR-10上的分类以及MS-Coco2017上的对象检测。

Improving sparsity of deep neural networks (DNNs) is essential for network compression and has drawn much attention. In this work, we disclose a harmful sparsifying process called filter collapse, which is common in DNNs with batch normalization (BN) and rectified linear activation functions (e.g. ReLU, Leaky ReLU). It occurs even without explicit sparsity-inducing regularizations such as $L_1$. This phenomenon is caused by the normalization effect of BN, which induces a non-trainable region in the parameter space and reduces the network capacity as a result. This phenomenon becomes more prominent when the network is trained with large learning rates (LR) or adaptive LR schedulers, and when the network is finetuned. We analytically prove that the parameters of BN tend to become sparser during SGD updates with high gradient noise and that the sparsifying probability is proportional to the square of learning rate and inversely proportional to the square of the scale parameter of BN. To prevent the undesirable collapsed filters, we propose a simple yet effective approach named post-shifted BN (psBN), which has the same representation ability as BN while being able to automatically make BN parameters trainable again as they saturate during training. With psBN, we can recover collapsed filters and increase the model performance in various tasks such as classification on CIFAR-10 and object detection on MS-COCO2017.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源