论文标题

评估独立性和有条件独立措施

Evaluating Independence and Conditional Independence Measures

论文作者

Ma, Jian

论文摘要

独立性和有条件的独立性(CI)是概率和统计的两个基本概念,可以应用于解决统计推断的许多核心问题。根据各种原则和概念定义了许多现有的独立性和CI措施。在本文中,审查了16项独立措施和16项CI措施,然后使用模拟和真实数据进行评估。对于独立措施,从正态分布,正常和阿基赛马副函数产生了八个模拟数据,以比较双变量或多变量,线性或非线性设置中的度量。两个UCI数据集,包括心脏病数据和葡萄酒质量数据,用于在实际条件下测试独立性措施的能力。对于CI度量,使用了两个具有正态分布和牙胶配置的模拟数据,以及一个真实数据(北京空气数据)用于测试预定的线性或非线性设置和实际情况中的CI测量。从实验结果中,我们发现,大多数措施通过呈现模拟的正确单调性来很好地在模拟数据上工作。但是,独立性和CI措施分别在许多复杂的真实数据上有所区别,只有少数可以被视为参考领域知识很好地工作。我们还发现,根据每种环境和通常情况下的行为的相似性,这些措施倾向于将其分为组。根据实验,我们建议CE作为独立和CI度量的好选择。这也是由于其严格的无分配定义和一致的非参数估计器。

Independence and Conditional Independence (CI) are two fundamental concepts in probability and statistics, which can be applied to solve many central problems of statistical inference. There are many existing independence and CI measures defined from diverse principles and concepts. In this paper, the 16 independence measures and 16 CI measures were reviewed and then evaluated with simulated and real data. For the independence measures, eight simulated data were generating from normal distribution, normal and Archimedean copula functions to compare the measures in bivariate or multivariate, linear or nonlinear settings. Two UCI dataset, including the heart disease data and the wine quality data, were used to test the power of the independence measures in real conditions. For the CI measures, two simulated data with normal distribution and Gumbel copula, and one real data (the Beijing air data) were utilized to test the CI measures in prespecified linear or nonlinear setting and real scenario. From the experimental results, we found that most of the measures work well on the simulated data by presenting the right monotonicity of the simulations. However, the independence and CI measures were differentiated on much complex real data respectively and only a few can be considered as working well with reference to domain knowledge. We also found that the measures tend to be separated into groups based on the similarity of the behaviors of them in each setting and in general. According to the experiments, we recommend CE as a good choice for both independence and CI measure. This is also due to its rigorous distribution-free definition and consistent nonparametric estimator.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源