论文标题
培训更薄,更深的神经网络:跳跃正规化
Training Thinner and Deeper Neural Networks: Jumpstart Regularization
论文作者
论文摘要
当神经网络具有多个层时,它们会更具表现力。反过来,只有在深度没有导致数值问题(例如爆炸或消失的梯度)时,传统的训练方法才能成功,而这些梯度的频率较小,而这些梯度较少。但是,增加宽度以达到更高的深度需要使用较重的计算资源,并导致过度参数化模型。这些随后的问题已通过模型压缩方法(例如量化和修剪)部分解决,其中一些依赖于基于标准化的损耗函数的正则化,以使大多数参数的效果可忽略不计。在这项工作中,我们建议使用正则化来防止神经元死亡或成为线性,这是一种我们将其表示为JumpStart正则化的技术。与常规训练相比,我们获得了更薄,更深且最重要的是更效率的神经网络。
Neural networks are more expressive when they have multiple layers. In turn, conventional training methods are only successful if the depth does not lead to numerical issues such as exploding or vanishing gradients, which occur less frequently when the layers are sufficiently wide. However, increasing width to attain greater depth entails the use of heavier computational resources and leads to overparameterized models. These subsequent issues have been partially addressed by model compression methods such as quantization and pruning, some of which relying on normalization-based regularization of the loss function to make the effect of most parameters negligible. In this work, we propose instead to use regularization for preventing neurons from dying or becoming linear, a technique which we denote as jumpstart regularization. In comparison to conventional training, we obtain neural networks that are thinner, deeper, and - most importantly - more parameter-efficient.