神经网络中的高斯预激活：神话还是现实？

论文标题

神经网络中的高斯预激活：神话还是现实？

Gaussian Pre-Activations in Neural Networks: Myth or Reality?

论文作者

Wolinski, Pierre, Arbel, Julyan

论文摘要

神经网络初始化时特征传播的研究在于众多初始化设计的根源。在田间非常常见的假设指出，前激活是高斯。尽管当每层神经元的数量趋于无穷大时，这一方便的高斯假说可以是合理的，但理论和实验性工作既挑战有限宽度的神经网络都受到挑战。我们的主要贡献是构建一对激活功能和初始化分布的家族，以确保在整个网络的深度，即使在狭窄的神经网络中，前激活仍保持高斯。在此过程中，我们发现了一组神经网络应实现的约束，以确保高斯的前活化。此外，我们对混乱线的边缘的主张进行了批判性审查，并建立了混乱分析的精确边缘。我们还提出了关于预活化传播的统一观点，涵盖了几个众所周知的初始化程序的框架。最后，我们的工作提供了一个原则上的框架来回答一个备受推测的问题：是否希望初始化培训的培训，该神经网络的前激活被确保是高斯？我们的代码可在github上找到：https：//github.com/p-wol/gaussian-preact/。

The study of feature propagation at initialization in neural networks lies at the root of numerous initialization designs. An assumption very commonly made in the field states that the pre-activations are Gaussian. Although this convenient Gaussian hypothesis can be justified when the number of neurons per layer tends to infinity, it is challenged by both theoretical and experimental works for finite-width neural networks. Our major contribution is to construct a family of pairs of activation functions and initialization distributions that ensure that the pre-activations remain Gaussian throughout the network's depth, even in narrow neural networks. In the process, we discover a set of constraints that a neural network should fulfill to ensure Gaussian pre-activations. Additionally, we provide a critical review of the claims of the Edge of Chaos line of works and build an exact Edge of Chaos analysis. We also propose a unified view on pre-activations propagation, encompassing the framework of several well-known initialization procedures. Finally, our work provides a principled framework for answering the much-debated question: is it desirable to initialize the training of a neural network whose pre-activations are ensured to be Gaussian? Our code is available on GitHub: https://github.com/p-wol/gaussian-preact/ .

下载PDF全文

下载文献需遵守相关版权规定

论文标题