论文标题
在嘈杂的半监督学习中,几乎要精确恢复
Almost exact recovery in noisy semi-supervised learning
论文作者
论文摘要
基于图形的半监督学习方法结合了图结构和标记数据以对未标记的数据进行分类。在这项工作中,我们研究了嘈杂的甲骨文对分类的影响。特别是,当嘈杂的甲骨文揭示了标签的一部分时,我们将最大的后验(MAP)估计器用于聚类校正的随机块模型(DC-SBM)。然后,我们提出了一种源自地图的持续放松而得出的算法,并确定了它的一致性。数值实验表明,即使在非常嘈杂的标记数据的情况下,我们的方法在合成和真实数据集方面也达到了有希望的性能。
Graph-based semi-supervised learning methods combine the graph structure and labeled data to classify unlabeled data. In this work, we study the effect of a noisy oracle on classification. In particular, we derive the Maximum A Posteriori (MAP) estimator for clustering a Degree Corrected Stochastic Block Model (DC-SBM) when a noisy oracle reveals a fraction of the labels. We then propose an algorithm derived from a continuous relaxation of the MAP, and we establish its consistency. Numerical experiments show that our approach achieves promising performance on synthetic and real data sets, even in the case of very noisy labeled data.