论文标题
混合细胞种群的单细胞基因调节网络分析与COVID-19
Single-cell gene regulatory network analysis for mixed cell populations with applications to COVID-19 single cell data
论文作者
论文摘要
基因调节网络(GRN)是指活细胞中基因之间的调节性相互作用形成的复杂网络。在本文中,我们考虑基于单细胞RNA测序(SCRNA-SEQ)数据推断单细胞中的GRN。在SCRNA-SEQ中,通常从混合群体中构造单个细胞,其细胞身份尚不清楚。单细胞GRN分析的一种常见实践是首先将细胞聚集并分别推断出每个群集。但是,此两步过程忽略了聚类步骤中的不确定性,因此可能导致网络估计不准确。为了解决这个问题,我们建议通过混合物多元泊松梁正常(MPLN)分布对SCRNA-SEQ进行建模。 MPLN的精度矩阵是不同细胞类型的GRN,可以通过最大化MPLN的套索含量的对数可能性来共同估计。我们表明,MPLN模型是可识别的,并且由此产生的惩罚对数似然估计器是一致的。为了避免MPLN的对数可能性的优化,我们基于变异推理方法开发了一种称为VMPLN的算法。全面的仿真和实际SCRNA-SEQ数据分析表明,VMPLN的性能要比最先进的单细胞GRN方法更好。
Gene regulatory network (GRN) refers to the complex network formed by regulatory interactions between genes in living cells. In this paper, we consider inferring GRNs in single cells based on single cell RNA sequencing (scRNA-seq) data. In scRNA-seq, single cells are often profiled from mixed populations and their cell identities are unknown. A common practice for single cell GRN analysis is to first cluster the cells and infer GRNs for every cluster separately. However, this two-step procedure ignores uncertainty in the clustering step and thus could lead to inaccurate estimation of the networks. To address this problem, we propose to model scRNA-seq by the mixture multivariate Poisson log-normal (MPLN) distribution. The precision matrices of the MPLN are the GRNs of different cell types and can be jointly estimated by maximizing MPLN's lasso-penalized log-likelihood. We show that the MPLN model is identifiable and the resulting penalized log-likelihood estimator is consistent. To avoid the intractable optimization of the MPLN's log-likelihood, we develop an algorithm called VMPLN based on the variational inference method. Comprehensive simulation and real scRNA-seq data analyses reveal that VMPLN performs better than the state-of-the-art single cell GRN methods.