论文标题

拆分PU:积极未标记学习的硬度感知培训策略

Split-PU: Hardness-aware Training Strategy for Positive-Unlabeled Learning

论文作者

Xu, Chengming, Liu, Chen, Yang, Siqian, Wang, Yabiao, Zhang, Shijie, Jia, Lijie, Fu, Yanwei

论文摘要

积极的未标记(PU)学习旨在学习具有稀有阳性样品和丰富未标记样本的模型。与经典的二进制分类相比,由于存在许多不完全宣布的数据实例,因此PU学习的任务更具挑战性。由于只有最自信的积极样本中的一部分,而且证据不足以分类其余样本,因此其中许多未标记的数据也可能是正样本。对此主题的研究对于许多需要非常昂贵的标签成本的现实世界任务特别有用,也是必不可少的。例如,疾病诊断,推荐系统和卫星图像识别方面的识别任务可能只有很少的阳性样本可以由专家注释。这些方法主要省略了某些未标记的数据的固有硬度,这可能会导致由于拟合了简单的噪声数据而导致次优性能,并且没有充分利用硬数据。在本文中,我们专注于通过新颖的培训管道来改善常用的NNPU。我们强调了数据集中样本硬度的内在差异以及用于简单和硬数据的适当学习策略。通过考虑这一事实,我们提议首先使用早期策略将未标记的数据集分开。在临时模型和基本模型之间预测不一致的样品被认为是硬样品。然后,该模型利用易于数据的詹森 - 香农发散损失。以及硬数据的双源一致性正规化,其中包括学生和基本模型之间的跨矛盾性,分别用于低级特征和高级特征和预测的自洽性。

Positive-Unlabeled (PU) learning aims to learn a model with rare positive samples and abundant unlabeled samples. Compared with classical binary classification, the task of PU learning is much more challenging due to the existence of many incompletely-annotated data instances. Since only part of the most confident positive samples are available and evidence is not enough to categorize the rest samples, many of these unlabeled data may also be the positive samples. Research on this topic is particularly useful and essential to many real-world tasks which demand very expensive labelling cost. For example, the recognition tasks in disease diagnosis, recommendation system and satellite image recognition may only have few positive samples that can be annotated by the experts. These methods mainly omit the intrinsic hardness of some unlabeled data, which can result in sub-optimal performance as a consequence of fitting the easy noisy data and not sufficiently utilizing the hard data. In this paper, we focus on improving the commonly-used nnPU with a novel training pipeline. We highlight the intrinsic difference of hardness of samples in the dataset and the proper learning strategies for easy and hard data. By considering this fact, we propose first splitting the unlabeled dataset with an early-stop strategy. The samples that have inconsistent predictions between the temporary and base model are considered as hard samples. Then the model utilizes a noise-tolerant Jensen-Shannon divergence loss for easy data; and a dual-source consistency regularization for hard data which includes a cross-consistency between student and base model for low-level features and self-consistency for high-level features and predictions, respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源