论文标题
用于稀疏主成分分析的近似算法
Approximation Algorithms for Sparse Principal Component Analysis
论文作者
论文摘要
主成分分析(PCA)是机器学习和多元统计数据中广泛使用的降低技术。为了提高PCA的解释性,已经提出了各种获得稀疏主方向负载的方法,这些方法称为稀疏主成分分析(SPCA)。在本文中,我们将阈值作为一个准确的,多项式时间,SPCA问题的近似算法,而无需对输入协方差矩阵施加任何限制性假设。我们使用单数值分解的第一个阈值算法在概念上很简单。比目前的最新速度快;并且在实践中表现良好。在负面的一面,我们(新颖的)理论界限不能准确预测这种方法的强大实际表现。第二算法解决了众所周知的半决赛编程松弛,然后使用小说,两个步骤确定性阈值方案来计算稀疏主载体。它在实践中运作良好,值得注意的是,我们的理论界限可以准确地预测这种稳定的实践表现,理论上的界限比当前的最新面前更好地弥合了理论实践差距。
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and multivariate statistics. To improve the interpretability of PCA, various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis (SPCA). In this paper, we present thresholding as a provably accurate, polynomial time, approximation algorithm for the SPCA problem, without imposing any restrictive assumptions on the input covariance matrix. Our first thresholding algorithm using the Singular Value Decomposition is conceptually simple; is faster than current state-of-the-art; and performs well in practice. On the negative side, our (novel) theoretical bounds do not accurately predict the strong practical performance of this approach. The second algorithm solves a well-known semidefinite programming relaxation and then uses a novel, two step, deterministic thresholding scheme to compute a sparse principal vector. It works very well in practice and, remarkably, this solid practical performance is accurately predicted by our theoretical bounds, which bridge the theory-practice gap better than current state-of-the-art.