论文标题
部分可观测时空混沌系统的无模型预测
The Mean Dimension of Neural Networks -- What causes the interaction effects?
论文作者
论文摘要
储层计算是预测湍流的有力工具,其简单的架构具有处理大型系统的计算效率。然而,其实现通常需要完整的状态向量测量和系统非线性知识。我们使用非线性投影函数将系统测量扩展到高维空间,然后将其输入到储层中以获得预测。我们展示了这种储层计算网络在时空混沌系统上的应用,该系统模拟了湍流的若干特征。我们表明,使用径向基函数作为非线性投影器,即使只有部分观测并且不知道控制方程,也能稳健地捕捉复杂的系统非线性。最后,我们表明,当测量稀疏、不完整且带有噪声,甚至控制方程变得不准确时,我们的网络仍然可以产生相当准确的预测,从而为实际湍流系统的无模型预测铺平了道路。
Owen and Hoyt recently showed that the effective dimension offers key structural information about the input-output mapping underlying an artificial neural network. Along this line of research, this work proposes an estimation procedure that allows the calculation of the mean dimension from a given dataset, without resampling from external distributions. The design yields total indices when features are independent and a variant of total indices when features are correlated. We show that this variant possesses the zero independence property. With synthetic datasets, we analyse how the mean dimension evolves layer by layer and how the activation function impacts the magnitude of interactions. We then use the mean dimension to study some of the most widely employed convolutional architectures for image recognition (LeNet, ResNet, DenseNet). To account for pixel correlations, we propose calculating the mean dimension after the addition of an inverse PCA layer that allows one to work on uncorrelated PCA-transformed features, without the need to retrain the neural network. We use the generalized total indices to produce heatmaps for post-hoc explanations, and we employ the mean dimension on the PCA-transformed features for cross comparisons of the artificial neural networks structures. Results provide several insights on the difference in magnitude of interactions across the architectures, as well as indications on how the mean dimension evolves during training.