论文标题

特征值来自特征值稀疏主成分分析(EESPCA)

Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA)

论文作者

Frost, H. Robert

论文摘要

我们提出了一种用于稀疏主成分分析的新技术。该方法从特征值稀疏主成分分析(EESPCA)中命名为特征向量,基于从完整矩阵和相关子中型的特征值中计算Hermitian矩阵平方特征向量载荷的公式。我们探索了EESPCA方法的两个版本:使用固定阈值诱导稀疏性的版本和通过交叉验证选择阈值的版本。 Relative to the state-of-the-art sparse PCA methods of Witten et al., Yuan & Zhang and Tan et al., the fixed threshold EESPCA technique offers an order-of-magnitude improvement in computational speed, does not require estimation of tuning parameters via cross-validation, and can more accurately identify true zero principal component loadings across a range of data matrix sizes and covariance structures.重要的是,EESPCA方法在保持样本外重建误差和PC估计误差的同时,取得了这些好处,接近所有评估方法产生的最低误差。 EESPCA是一种实用有效的技术,用于稀疏PCA,与计算要求的统计问题特别相关,例如分析高维数据集或统计技术(例如重新采样)的应用,涉及重复计算稀疏PC。

We present a novel technique for sparse principal component analysis. This method, named Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA), is based on the formula for computing squared eigenvector loadings of a Hermitian matrix from the eigenvalues of the full matrix and associated sub-matrices. We explore two versions of the EESPCA method: a version that uses a fixed threshold for inducing sparsity and a version that selects the threshold via cross-validation. Relative to the state-of-the-art sparse PCA methods of Witten et al., Yuan & Zhang and Tan et al., the fixed threshold EESPCA technique offers an order-of-magnitude improvement in computational speed, does not require estimation of tuning parameters via cross-validation, and can more accurately identify true zero principal component loadings across a range of data matrix sizes and covariance structures. Importantly, the EESPCA method achieves these benefits while maintaining out-of-sample reconstruction error and PC estimation error close to the lowest error generated by all evaluated approaches. EESPCA is a practical and effective technique for sparse PCA with particular relevance to computationally demanding statistical problems such as the analysis of high-dimensional data sets or application of statistical techniques like resampling that involve the repeated calculation of sparse PCs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源