论文标题
设计通用因果深度学习模型:几何(超)变压器
Designing Universal Causal Deep Learning Models: The Geometric (Hyper)Transformer
论文作者
论文摘要
随机分析中的几个问题是通过其几何形状来定义的,并保留几何结构对于产生有意义的预测至关重要。然而,如何设计能够编码这些几何结构的有原则的深度学习(DL)模型仍然未知。我们通过引入通用因果几何DL框架来解决这个开放问题$ \ mathscr {x}^{\ mathbb {z}} $ to $ \ mathscr {y}^{\ mathbb {z}} $ in $ \ mathscr {y}} $,同时尊重他们的远期信息流。 $ \ mathscr {y} $上的合适几何形状包括各种(适应性的)瓦斯坦斯坦空间,这些空间在最佳停止问题,各种统计歧管中,描述了连续时间有限状态马尔可夫链的条件分布,以及所有FréchetSpace的有条件分布,所有的fréchet空间都承认了Schauder基础,例如。如古典财务。合适的空间$ \ mathscr {x} $是任何欧几里得空间的紧凑子集。我们的结果均定量地表达了我们的DL模型所需的参数数量,以达到给定的近似误差,这是目标映射规则性的函数和$ \ Mathscr {x} $的几何结构和$ \ Mathscr {y} $的几何结构。即使省略任何时间结构,我们的通用近似定理也是首先保证的,即在此类$ \ Mathscr {x} $和$ \ Mathscr {y} $之间定义的Hölder函数可以通过DL模型近似。
Several problems in stochastic analysis are defined through their geometry, and preserving that geometric structure is essential to generating meaningful predictions. Nevertheless, how to design principled deep learning (DL) models capable of encoding these geometric structures remains largely unknown. We address this open problem by introducing a universal causal geometric DL framework in which the user specifies a suitable pair of metric spaces $\mathscr{X}$ and $\mathscr{Y}$ and our framework returns a DL model capable of causally approximating any ``regular'' map sending time series in $\mathscr{X}^{\mathbb{Z}}$ to time series in $\mathscr{Y}^{\mathbb{Z}}$ while respecting their forward flow of information throughout time. Suitable geometries on $\mathscr{Y}$ include various (adapted) Wasserstein spaces arising in optimal stopping problems, a variety of statistical manifolds describing the conditional distribution of continuous-time finite state Markov chains, and all Fréchet spaces admitting a Schauder basis, e.g. as in classical finance. Suitable spaces $\mathscr{X}$ are compact subsets of any Euclidean space. Our results all quantitatively express the number of parameters needed for our DL model to achieve a given approximation error as a function of the target map's regularity and the geometric structure both of $\mathscr{X}$ and of $\mathscr{Y}$. Even when omitting any temporal structure, our universal approximation theorems are the first guarantees that Hölder functions, defined between such $\mathscr{X}$ and $\mathscr{Y}$ can be approximated by DL models.