论文标题
与浅网络相比
Smaller generalization error derived for a deep residual neural network compared to shallow networks
论文作者
论文摘要
对于具有$ l $ l $随机傅里叶特征层的残留神经网络证明了概括错误的估计值 $ \ bar z _ {\ ell+ 1} = \ bar z_ \ ell+ \ al+ \ mathrm {re} \ sum_ {k = 1}^k \ bar b _ {\ ell k} \ Mathrm {re} \ sum_ {k = 1}^k \ bar c _ {\ ell k} e^{\ mathrm {i}ω'_ _ {\ ell k} \ k} \ cdot x} $。随机傅立叶的频率$(ω_{\ ell k},ω'__{\ ell k})$的最佳分布。 X} $是派生的。此推导基于函数值$ f(x)$的近似值的相应概括误差。事实证明,概括性错误小于估计$ {\ | \ hat f \ |^2_ {l^1(\ Mathbb {r}^d)}}}}/{(kl)} $对于随机傅立叶的概括错误,带有一个隐藏层和相同的$ kl $ fly $ l-l- $ l-l- $ forty $ l. $ l^1 $ -NORM的傅立叶变换$ \ hat f $。对随机特征的最佳分布的这种理解用于构建一种新的培训方法,以用于深度残留网络。在计算实验中证明了所提出的新算法的有希望的性能。
Estimates of the generalization error are proved for a residual neural network with $L$ random Fourier features layers $\bar z_{\ell+1}=\bar z_\ell + \mathrm{Re}\sum_{k=1}^K\bar b_{\ell k}e^{\mathrm{i}ω_{\ell k}\bar z_\ell}+ \mathrm{Re}\sum_{k=1}^K\bar c_{\ell k}e^{\mathrm{i}ω'_{\ell k}\cdot x}$. An optimal distribution for the frequencies $(ω_{\ell k},ω'_{\ell k})$ of the random Fourier features $e^{\mathrm{i}ω_{\ell k}\bar z_\ell}$ and $e^{\mathrm{i}ω'_{\ell k}\cdot x}$ is derived. This derivation is based on the corresponding generalization error for the approximation of the function values $f(x)$. The generalization error turns out to be smaller than the estimate ${\|\hat f\|^2_{L^1(\mathbb{R}^d)}}/{(KL)}$ of the generalization error for random Fourier features with one hidden layer and the same total number of nodes $KL$, in the case the $L^\infty$-norm of $f$ is much less than the $L^1$-norm of its Fourier transform $\hat f$. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network. Promising performance of the proposed new algorithm is demonstrated in computational experiments.