论文标题
具有椭圆形或非胸花形状的簇之间的量距离
Quantifying Distances Between Clusters with Elliptical or Non-Elliptical Shapes
论文作者
论文摘要
有限混合模型允许广泛的潜在非胸花簇分布是一个新兴的方法论领域。这种方法允许簇的形状与数据的自然异质性匹配,而不是强迫一系列椭圆形簇。这些方法与群集连续非正常数据高度相关 - 这是一种客观数据的常见发生,现在在健康研究中常规捕获。但是,解释和比较此类模型 - 尤其是关于它们是否产生有意义的群集相当分开的有意义的群集 - 是无处不在的。我们总结了几种措施,可以简洁地量化两个簇之间的多元距离,无论群集分布如何,并建议实用的计算工具。通过一项模拟研究,我们在三种情况下评估了这些措施,从而使集群在平均值,比例和旋转方面有所不同。然后,我们使用对转诊焦虑研究的一部分捕获的情绪图像的生理反应来证明我们的方法,这是一项大规模研究焦虑症患者和对照参与者的大规模研究。最后,我们合成发现,以提供有关如何在聚类应用中使用距离测量的指导。
Finite mixture models that allow for a broad range of potentially non-elliptical cluster distributions is an emerging methodological field. Such methods allow for the shape of the clusters to match the natural heterogeneity of the data, rather than forcing a series of elliptical clusters. These methods are highly relevant for clustering continuous non-normal data - a common occurrence with objective data that are now routinely captured in health research. However, interpreting and comparing such models - especially with regards to whether they produce meaningful clusters that are reasonably well separated - is non-trivial. We summarize several measures that can succinctly quantify the multivariate distance between two clusters, regardless of the cluster distribution, and suggest practical computational tools. Through a simulation study, we evaluate these measures across three scenarios that allow for clusters to differ in mean, scale, and rotation. We then demonstrate our approaches using physiological responses to emotional imagery captured as part of the Transdiagnostic Anxiety Study, a large-scale study of anxiety disorder spectrum patients and control participants. Finally, we synthesize findings to provide guidance on how to use distance measures in clustering applications.