全球最佳非凸神经网络训练的概括结合：通过无限尺寸langevin动力学估计运输图

论文标题

全球最佳非凸神经网络训练的概括结合：通过无限尺寸langevin动力学估计运输图

Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics

论文作者

Suzuki, Taiji

论文摘要

我们引入了一个新的理论框架，以分析与其概括错误的连接深度学习优化。现有的框架，例如用于神经网络优化分析的平均场理论和神经切线核理论通常需要限制网络的无限宽度，以显示其全球收敛。这可能使直接处理有限宽度网络变得困难；尤其是在神经切线核心方面，我们无法揭示核方法以外的神经网络的有利特性。为了实现更自然的分析，我们考虑了一种完全不同的方法，在该方法中，我们将参数培训作为运输图估计，并通过无限维度langevin动力学理论显示其全局收敛。这使我们能够以统一的方式分析狭窄的网络。此外，我们为动力学获得的解决方案提供了概括差距和多余的风险范围。多余的风险界限达到了所谓的快速学习率。特别是，我们显示出针对分类问题的指数收敛性和回归问题的最小值最佳速率。

We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error. Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence. This potentially makes it difficult to directly deal with finite width network; especially in the neural tangent kernel regime, we cannot reveal favorable properties of neural networks beyond kernel methods. To realize more natural analysis, we consider a completely different approach in which we formulate the parameter training as a transportation map estimation and show its global convergence via the theory of the infinite dimensional Langevin dynamics. This enables us to analyze narrow and wide networks in a unifying manner. Moreover, we give generalization gap and excess risk bounds for the solution obtained by the dynamics. The excess risk bound achieves the so-called fast learning rate. In particular, we show an exponential convergence for a classification problem and a minimax optimal rate for a regression problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题