非负矩阵分解的等级选择

论文标题

非负矩阵分解的等级选择

Rank Selection for Non-negative Matrix Factorization

论文作者

Cai, Yun, Gu, Hong, Kenney, Toby

论文摘要

非负矩阵分解（NMF）是一种广泛使用的维数方法，将非负数据矩阵分配到两个较低的非维矩阵中：一个是由变量组成的基础或特征矩阵，另一个是由数据点矩阵组成的基质矩阵，这是数据点上的数据点。这些功能可以解释为数据的子结构。特征矩阵中的子结构的数量也称为等级，这是NMF中唯一的调谐参数。适当的等级将提取关键的潜在特征，同时最大程度地减少原始数据的噪声。在本文中，我们基于假设检验开发了一种新型的等级选择方法，尽管有大量优化误差，但使用反volved的引导性分布来准确评估显着性水平。在“模拟”部分中，我们将我们的方法与基于假设测试的秩选择方法进行比较，但使用bootstrap分布而无需反卷积，并与交叉验证的插定方法1进行了比较。通过模拟，我们证明我们的方法不仅可以准确地估算NMF的真实等级，尤其是在特征很难区分，而且在计算时也有效时。当应用于真实微生物组数据（例如OTU数据和功能性元基因组数据）时，我们的方法还显示了在数据中提取可解释的子社区的能力。

Non-Negative Matrix Factorization (NMF) is a widely used dimension reduction method that factorizes a non-negative data matrix into two lower dimensional non-negative matrices: One is the basis or feature matrix which consists of the variables and the other is the coefficients matrix which is the projections of data points to the new basis. The features can be interpreted as sub-structures of the data. The number of sub-structures in the feature matrix is also called the rank which is the only tuning parameter in NMF. An appropriate rank will extract the key latent features while minimizing the noise from the original data. In this paper, we develop a novel rank selection method based on hypothesis testing, using a deconvolved bootstrap distribution to assess the significance level accurately despite the large amount of optimization error. In the simulation section, we compare our method with a rank selection method based on hypothesis testing using bootstrap distribution without deconvolution, and with a cross-validated imputation method1. Through simulations, we demonstrate that our method is not only accurate at estimating the true ranks for NMF especially when the features are hard to distinguish but also efficient at computation. When applied to real microbiome data (e.g. OTU data and functional metagenomic data), our method also shows the ability to extract interpretable sub-communities in the data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题