论文标题
高斯过程中的后和计算不确定性
Posterior and Computational Uncertainty in Gaussian Processes
论文作者
论文摘要
高斯过程与数据集的大小过时地缩放。作为响应,已经开发了许多近似方法,不可避免地引入了近似误差。当使用近似后部时,由于计算有限而导致的不确定性源被完全忽略。因此,在实践中,GP模型通常与近似方法一样重要。在这里,我们开发了一类新的方法,该方法提供了对观察到的数据数量和所消耗的有限计算量产生的组合不确定性的一致估计。最常见的GP近似值将其映射到该类别中的实例,例如基于Cholesky分解,共轭梯度和诱导点的方法。对于此类中的任何方法,我们证明了(i)在相关RKHS中其后均值的收敛性,(ii)将其联合后协方差分解为数学和计算协方差,(iii)组合方差是该方法的正方形误差的紧密差异,因为该方法的后置差异是一种紧密的差异。最后,我们从经验上证明了忽略计算不确定性的后果,并表明对其进行隐式建模它如何改善基准数据集上的概括性能。
Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. Therefore in practice, GP models are often as much about the approximation method as they are about the data. Here, we develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended. The most common GP approximations map to an instance in this class, such as methods based on the Cholesky factorization, conjugate gradients, and inducing points. For any method in this class, we prove (i) convergence of its posterior mean in the associated RKHS, (ii) decomposability of its combined posterior covariance into mathematical and computational covariances, and (iii) that the combined variance is a tight worst-case bound for the squared error between the method's posterior mean and the latent function. Finally, we empirically demonstrate the consequences of ignoring computational uncertainty and show how implicitly modeling it improves generalization performance on benchmark datasets.