特征值来自特征值稀疏主成分分析（EESPCA）

论文标题

特征值来自特征值稀疏主成分分析（EESPCA）

Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA)

论文作者

Frost, H. Robert

论文摘要

我们提出了一种用于稀疏主成分分析的新技术。该方法从特征值稀疏主成分分析（EESPCA）中命名为特征向量，基于从完整矩阵和相关子中型的特征值中计算Hermitian矩阵平方特征向量载荷的公式。我们探索了EESPCA方法的两个版本：使用固定阈值诱导稀疏性的版本和通过交叉验证选择阈值的版本。 Relative to the state-of-the-art sparse PCA methods of Witten et al., Yuan & Zhang and Tan et al., the fixed threshold EESPCA technique offers an order-of-magnitude improvement in computational speed, does not require estimation of tuning parameters via cross-validation, and can more accurately identify true zero principal component loadings across a range of data matrix sizes and covariance structures.重要的是，EESPCA方法在保持样本外重建误差和PC估计误差的同时，取得了这些好处，接近所有评估方法产生的最低误差。 EESPCA是一种实用有效的技术，用于稀疏PCA，与计算要求的统计问题特别相关，例如分析高维数据集或统计技术（例如重新采样）的应用，涉及重复计算稀疏PC。

We present a novel technique for sparse principal component analysis. This method, named Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA), is based on the formula for computing squared eigenvector loadings of a Hermitian matrix from the eigenvalues of the full matrix and associated sub-matrices. We explore two versions of the EESPCA method: a version that uses a fixed threshold for inducing sparsity and a version that selects the threshold via cross-validation. Relative to the state-of-the-art sparse PCA methods of Witten et al., Yuan & Zhang and Tan et al., the fixed threshold EESPCA technique offers an order-of-magnitude improvement in computational speed, does not require estimation of tuning parameters via cross-validation, and can more accurately identify true zero principal component loadings across a range of data matrix sizes and covariance structures. Importantly, the EESPCA method achieves these benefits while maintaining out-of-sample reconstruction error and PC estimation error close to the lowest error generated by all evaluated approaches. EESPCA is a practical and effective technique for sparse PCA with particular relevance to computationally demanding statistical problems such as the analysis of high-dimensional data sets or application of statistical techniques like resampling that involve the repeated calculation of sparse PCs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题