论文标题
带有Varimax的复古因子分析执行统计推断
Vintage Factor Analysis with Varimax Performs Statistical Inference
论文作者
论文摘要
心理学家开发了多元因素分析,将多元数据分解为少数可解释的因素,而没有任何关于这些因素的知识。在这种因素分析的形式中,varimax“因子旋转”是使因素可解释的关键步骤。查尔斯·斯皮尔曼(Charles Spearman)和其他许多人反对因素旋转,因为这些因素似乎在旋转上不变。这些异议在所有当代多元统计教科书中仍有报道。这是一个Engima,因为这种老式的因素分析已经幸存下来,并且在经验上很受欢迎,因为从经验上讲,因子旋转通常会使因子更容易解释。我们认为旋转使这些因素更容易解释,因为实际上,Varimax因子旋转执行统计推断。我们表明,具有Varimax旋转的主组件分析(PCA)为一类广泛的现代因子模型提供了统一的光谱估计策略,包括随机块模型和潜在的Dirichlet分配的自然变化(即“主题建模”)。此外,我们表明Thurstone广泛采用的稀疏性诊断隐含地评估了一个关键的“ leptokurtic”疾病,该疾病使这些模型中的旋转统计上可识别。综上所述,这表明,老式因素分析的专业知识执行统计推断,扭转了将近一个世纪的统计思维。具有稀疏的本ensolver,具有varimax的PCA既快速又稳定。结合Thurstone的直接诊断,这种复古方法适用于广泛的现代应用。
Psychologists developed Multiple Factor Analysis to decompose multivariate data into a small number of interpretable factors without any a priori knowledge about those factors. In this form of factor analysis, the Varimax "factor rotation" is a key step to make the factors interpretable. Charles Spearman and many others objected to factor rotations because the factors seem to be rotationally invariant. These objections are still reported in all contemporary multivariate statistics textbooks. This is an engima because this vintage form of factor analysis has survived and is widely popular because, empirically, the factor rotation often makes the factors easier to interpret. We argue that the rotation makes the factors easier to interpret because, in fact, the Varimax factor rotation performs statistical inference. We show that Principal Components Analysis (PCA) with the Varimax rotation provides a unified spectral estimation strategy for a broad class of modern factor models, including the Stochastic Blockmodel and a natural variation of Latent Dirichlet Allocation (i.e., "topic modeling"). In addition, we show that Thurstone's widely employed sparsity diagnostics implicitly assess a key "leptokurtic" condition that makes the rotation statistically identifiable in these models. Taken together, this shows that the know-how of Vintage Factor Analysis performs statistical inference, reversing nearly a century of statistical thinking on the topic. With a sparse eigensolver, PCA with Varimax is both fast and stable. Combined with Thurstone's straightforward diagnostics, this vintage approach is suitable for a wide array of modern applications.