论文标题

深层网络:了解深度学习中培训和概括的框架

Deep Gated Networks: A framework to understand training and generalisation in deep learning

论文作者

Lakshminarayanan, Chandrashekar, Singh, Amit Vikram

论文摘要

了解(随机)梯度下降(SGD)在近期的对象研究中,(随机)梯度下降(SGD)在具有RELU激活的深神经网络(DNN)中的作用一直是对象研究。在本文中,我们利用深层封闭式网络(DGN)作为框架,以获取有关Relu激活的DNN的见解。在DGN中,单个神经元单元具有两个组件,即预激活输入(等于内部产物层的权重和先前的层输出),并且属于$ [0,1] $的门控值,而神经元单元的输出等于预活输入输入值的乘法和门的乘积。具有RELU激活的标准DNN是DGN的一种特殊情况,其中,基于前激活输入是正动激活输入,门控值为$ 1/0 $。我们理论上对几种DGN的变体进行了分析和实验,每个变体都适合于了解具有Relu激活的DNN中训练或概括的特定方面。我们的理论阐明了两个问题,即i)为什么要提高深度直到训练有助于训练和ii)为什么要增加深度超过某个点会伤害训练?我们还提供了实验证据,以表明栅极适应,即,通过训练过程中门控值的变化是概括的关键。

Understanding the role of (stochastic) gradient descent (SGD) in the training and generalisation of deep neural networks (DNNs) with ReLU activation has been the object study in the recent past. In this paper, we make use of deep gated networks (DGNs) as a framework to obtain insights about DNNs with ReLU activation. In DGNs, a single neuronal unit has two components namely the pre-activation input (equal to the inner product the weights of the layer and the previous layer outputs), and a gating value which belongs to $[0,1]$ and the output of the neuronal unit is equal to the multiplication of pre-activation input and the gating value. The standard DNN with ReLU activation, is a special case of the DGNs, wherein the gating value is $1/0$ based on whether or not the pre-activation input is positive or negative. We theoretically analyse and experiment with several variants of DGNs, each variant suited to understand a particular aspect of either training or generalisation in DNNs with ReLU activation. Our theory throws light on two questions namely i) why increasing depth till a point helps in training and ii) why increasing depth beyond a point hurts training? We also present experimental evidence to show that gate adaptation, i.e., the change of gating value through the course of training is key for generalisation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源