论文标题

内核两样本测试在高维:矩差与尺寸和样本订单之间的相互作用

Kernel Two-Sample Tests in High Dimension: Interplay Between Moment Discrepancy and Dimension-and-Sample Orders

论文作者

Yan, Jian, Zhang, Xianyang

论文摘要

当基于内核的指标在高维和大规模数据中的使用越来越多,我们研究了粒子两样本测试的渐近行为,当尺寸和样本大小均不同到无穷大时。我们专注于使用各向同性核的最大平均差异(MMD),包括带有高斯内核和拉普拉斯内核的MMD,以及作为特殊情况的能量距离。我们得出了内核两样本统计的渐近扩展,基于我们在零假设以及局部和固定替代方案下建立中心极限定理(CLT)。新的非零CLT结果使我们能够进行渐近精确的功率分析,这揭示了矩差之间的微妙相互作用,可以通过内核的两样本测试和维度和样本顺序检测到。通过数值研究进一步证实了渐近理论。

Motivated by the increasing use of kernel-based metrics for high-dimensional and large-scale data, we study the asymptotic behavior of kernel two-sample tests when the dimension and sample sizes both diverge to infinity. We focus on the maximum mean discrepancy (MMD) using isotropic kernel, including MMD with the Gaussian kernel and the Laplace kernel, and the energy distance as special cases. We derive asymptotic expansions of the kernel two-sample statistics, based on which we establish the central limit theorem (CLT) under both the null hypothesis and the local and fixed alternatives. The new non-null CLT results allow us to perform asymptotic exact power analysis, which reveals a delicate interplay between the moment discrepancy that can be detected by the kernel two-sample tests and the dimension-and-sample orders. The asymptotic theory is further corroborated through numerical studies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源