张量程序II：任何架构的神经切线内核

论文标题

张量程序II：任何架构的神经切线内核

Tensor Programs II: Neural Tangent Kernel for Any Architecture

论文作者

Yang, Greg

论文摘要

我们证明，由于网络宽度趋于无穷大，因此 *任何体系结构 *的随机初始化的神经网络都具有切线内核（NTK）收敛到确定性极限。我们演示了如何计算此限制。在先前的文献中，对神经网络梯度的启发式研究通常假设向前传播中使用的每个重量矩阵都独立于其反向传播中使用的转po（Schoenholz等，2017）。这被称为 *梯度独立假设（GIA） *。我们确定一个通常满足的条件，我们称之为 *简单的GIA检查 *，以便基于GIA的NTK限制计算是正确的。相反，当简单的GIA检查失败时，我们表明GIA可能会导致错误的答案。我们的材料以友好的方式介绍了Yang（2019a）的NTK结果，并展示了 *张量程序 *用于理解广泛神经网络的技术。我们在https://github.com/thegregyang/ntk4a上提供了复发性神经网络，变压器和批量归一化的无限宽度NTK的参考实现。

We prove that a randomly initialized neural network of *any architecture* has its Tangent Kernel (NTK) converge to a deterministic limit, as the network widths tend to infinity. We demonstrate how to calculate this limit. In prior literature, the heuristic study of neural network gradients often assumes every weight matrix used in forward propagation is independent from its transpose used in backpropagation (Schoenholz et al. 2017). This is known as the *gradient independence assumption (GIA)*. We identify a commonly satisfied condition, which we call *Simple GIA Check*, such that the NTK limit calculation based on GIA is correct. Conversely, when Simple GIA Check fails, we show GIA can result in wrong answers. Our material here presents the NTK results of Yang (2019a) in a friendly manner and showcases the *tensor programs* technique for understanding wide neural networks. We provide reference implementations of infinite-width NTKs of recurrent neural network, transformer, and batch normalization at https://github.com/thegregyang/NTK4A.

下载PDF全文

下载文献需遵守相关版权规定

论文标题