论文标题
使用多元偏差正常分布对审查和缺失数据的有限混合物建模
Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution
论文作者
论文摘要
有限混合模型已被广泛用于建模和分析来自异质种群的数据。此外,由于实验设备的限制,这种数据可能会丢失或受到某些上部和/或较低检测限。当每个人群的测量显着偏离正态性时,会产生另一个并发症,例如不对称行为。对于此类数据结构,我们基于多元偏度正常分布的有限混合物,为审查和/或缺少数据提出了一个可靠的模型。这种方法使我们能够以极大的灵活性对数据进行建模,同时根据混合物组件的结构同时适应多模式和偏度。我们开发了一种分析性简单但有效的em-type算法,用于进行参数的最大似然估计。该算法在e-step上具有闭合形式的表达式,该表达式依赖于公式用于截断的多元偏差正常分布的平均值和方差。此外,还提出了一种基于信息的一般信息方法,用于近似估计器的渐近协方差矩阵。据报道,从模拟和实际数据集的分析获得的结果证明了所提出的方法的有效性。提出的算法和方法在新的R软件包CensMFM中实现。
Finite mixture models have been widely used to model and analyze data from a heterogeneous populations. Moreover, data of this kind can be missing or subject to some upper and/or lower detection limits because of the restriction of experimental apparatuses. Another complication arises when measures of each population depart significantly from normality, for instance, asymmetric behavior. For such data structures, we propose a robust model for censored and/or missing data based on finite mixtures of multivariate skew-normal distributions. This approach allows us to model data with great flexibility, accommodating multimodality and skewness, simultaneously, depending on the structure of the mixture components. We develop an analytically simple, yet efficient, EM- type algorithm for conducting maximum likelihood estimation of the parameters. The algorithm has closed-form expressions at the E-step that rely on formulas for the mean and variance of the truncated multivariate skew-normal distributions. Furthermore, a general information-based method for approximating the asymptotic covariance matrix of the estimators is also presented. Results obtained from the analysis of both simulated and real datasets are reported to demonstrate the effectiveness of the proposed method. The proposed algorithm and method are implemented in the new R package CensMFM.