论文标题
重新访问神经网络的初始化
Revisiting Initialization of Neural Networks
论文作者
论文摘要
重量的适当初始化对于深度神经网络(DNN)的有效训练和快速收敛至关重要。该领域的先前工作主要集中于平衡每层权重之间的差异,以保持(i)通过网络向前传播的输入数据的稳定性,以及(ii)向后传播的损耗梯度。然而,这种普遍的启发式是对各个层之间梯度之间依赖关系的不可知论,并且仅捕获一级效应。在本文中,我们提出并讨论了一种初始化原理,该原理是基于对跨层重量全局曲率进行严格估计,通过近似和控制其Hessian矩阵的规范。所提出的方法是更系统的,并为DNN激活(例如平滑功能,辍学和依赖)恢复了先前的结果。我们在Word2Vec和MNIST/CIFAR图像分类任务上进行的实验确认跟踪Hessian Norm是一种有用的诊断工具,可帮助更严格地初始化权重初始化
The proper initialization of weights is crucial for the effective training and fast convergence of deep neural networks (DNNs). Prior work in this area has mostly focused on balancing the variance among weights per layer to maintain stability of (i) the input data propagated forwards through the network and (ii) the loss gradients propagated backwards, respectively. This prevalent heuristic is however agnostic of dependencies among gradients across the various layers and captures only firstorder effects. In this paper, we propose and discuss an initialization principle that is based on a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. The proposed approach is more systematic and recovers previous results for DNN activations such as smooth functions, dropouts, and ReLU. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool which helps to more rigorously initialize weights