论文标题
在副因素存在下,有条件的高斯图形模型的混合物,用于未标记的异质种群
Mixture of Conditional Gaussian Graphical Models for unlabelled heterogeneous populations in the presence of co-factors
论文作者
论文摘要
高斯图形模型(GGM)内的条件相关网络被广泛用于描述随机向量的组件之间的直接相互作用。在未标记的异质种群的情况下,已经提出了GGM混合物的期望最大化(EM)算法来估计每个子人群的图和类标签。但是,我们认为,借助大多数实际数据,无法用高斯的混合物来描述类隶属关系,这主要根据数据点根据其几何近端进行了分组。特别是,通常存在外部共同创作,其值会影响特征的平均值,散射属于同一子群体的特征空间数据点。此外,如果共同作用对特征的影响是异质的,则该效应的估计不能与子群鉴定分开。在本文中,我们提出了有条件的GGM(CGGM)的混合物,该混合物减去了共同创作的异质效应,以将数据点重组为相应的群集。我们开发了一种受惩罚的EM算法来估计图形 - 平方模型参数。我们在合成和真实数据上证明了该方法如何实现其目标,并成功地识别了GGM混合物被共同功能破坏的子选集。
Conditional correlation networks, within Gaussian Graphical Models (GGM), are widely used to describe the direct interactions between the components of a random vector. In the case of an unlabelled Heterogeneous population, Expectation Maximisation (EM) algorithms for Mixtures of GGM have been proposed to estimate both each sub-population's graph and the class labels. However, we argue that, with most real data, class affiliation cannot be described with a Mixture of Gaussian, which mostly groups data points according to their geometrical proximity. In particular, there often exists external co-features whose values affect the features' average value, scattering across the feature space data points belonging to the same sub-population. Additionally, if the co-features' effect on the features is Heterogeneous, then the estimation of this effect cannot be separated from the sub-population identification. In this article, we propose a Mixture of Conditional GGM (CGGM) that subtracts the heterogeneous effects of the co-features to regroup the data points into sub-population corresponding clusters. We develop a penalised EM algorithm to estimate graph-sparse model parameters. We demonstrate on synthetic and real data how this method fulfils its goal and succeeds in identifying the sub-populations where the Mixtures of GGM are disrupted by the effect of the co-features.