论文标题

一种贪婪和乐观的方法,用于与协变量的特定不确定性聚类

A Greedy and Optimistic Approach to Clustering with a Specified Uncertainty of Covariates

论文作者

Okuno, Akifumi, Hattori, Kohei

论文摘要

在这项研究中,我们研究了一个聚类问题,其中数据集中每个元素的协变量与该元素特有的不确定性相关联。更具体地说,我们考虑一种聚类方法,其中将非线性转换应用于协变量的预处理用于捕获隐藏的数据结构。为此,我们近似代表预处理特征的传播不确定性的集合。为了利用经验不确定性集,我们提出了一种贪婪和乐观的聚类(GOC)算法,该算法在此类集合上找到了更好的特征候选者,从而产生了更多的凝结群集。作为一个重要的应用,我们将GOC算法应用于通过数值模拟产生的恒星轨道特性的合成数据集,模仿了银河系的形成过程。 GOC算法在寻找来自同一矮星系的同胞星星方面表现出改善的性能。这些现实的数据集也已公开可用。

In this study, we examine a clustering problem in which the covariates of each individual element in a dataset are associated with an uncertainty specific to that element. More specifically, we consider a clustering approach in which a pre-processing applying a non-linear transformation to the covariates is used to capture the hidden data structure. To this end, we approximate the sets representing the propagated uncertainty for the pre-processed features empirically. To exploit the empirical uncertainty sets, we propose a greedy and optimistic clustering (GOC) algorithm that finds better feature candidates over such sets, yielding more condensed clusters. As an important application, we apply the GOC algorithm to synthetic datasets of the orbital properties of stars generated through our numerical simulation mimicking the formation process of the Milky Way. The GOC algorithm demonstrates an improved performance in finding sibling stars originating from the same dwarf galaxy. These realistic datasets have also been made publicly available.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源