高维度的经验贝叶斯PCA

论文标题

高维度的经验贝叶斯PCA

Empirical Bayes PCA in high dimensions

论文作者

Zhong, Xinyi, Su, Chang, Fan, Zhou

论文摘要

当数据的尺寸与数据样本数量相当或大于数据样本的数量时，主成分分析（PCA）可能会显示出有问题的高维噪声。在这项工作中，我们提出了一种经验性贝叶斯PCA方法，该方法通过估计主要成分的联合先验分布来降低这种噪声。 EB-PCA基于经典的Kiefer-Wolfowitz非参数MLE，用于经验贝叶斯估计，分布结果来自样品PC的随机矩阵理论，以及使用近似消息传递（AMP）算法的迭代细化。在理论上的“尖峰”模型中，EB-PCA在与知道真正的先验的Oracle贝叶斯AMP程序相同的设置下实现了贝叶斯最佳估计精度。从经验上讲，在模拟和根据1000个基因组项目和国际HAPMAP项目构建的定量基准上，EB-PCA在PCA上有显着改善。提出了一个例证，用于分析通过单细胞RNA-seq获得的基因表达数据。

When the dimension of data is comparable to or larger than the number of data samples, Principal Components Analysis (PCA) may exhibit problematic high-dimensional noise. In this work, we propose an Empirical Bayes PCA method that reduces this noise by estimating a joint prior distribution for the principal components. EB-PCA is based on the classical Kiefer-Wolfowitz nonparametric MLE for empirical Bayes estimation, distributional results derived from random matrix theory for the sample PCs, and iterative refinement using an Approximate Message Passing (AMP) algorithm. In theoretical "spiked" models, EB-PCA achieves Bayes-optimal estimation accuracy in the same settings as an oracle Bayes AMP procedure that knows the true priors. Empirically, EB-PCA significantly improves over PCA when there is strong prior structure, both in simulation and on quantitative benchmarks constructed from the 1000 Genomes Project and the International HapMap Project. An illustration is presented for analysis of gene expression data obtained by single-cell RNA-seq.

下载PDF全文

下载文献需遵守相关版权规定

论文标题