论文标题
二进制分类,带有阳性标签来源
Binary Classification with Positive Labeling Sources
论文作者
论文摘要
为了为机器学习模型有效,有效地创建大量的培训标签,研究人员转向了弱监督(WS),该模型使用程序化标签源而不是手动注释。 WS用于二进制分类的现有作品通常假设能够以大致平衡比例分配正面和负标签的标签来源。但是,对于有少数族裔积极阶级的许多感兴趣的任务,负面示例可能太多了,对于开发人员而言,无法产生指示性标记来源。因此,在这项工作中,我们仅使用正面标签来源研究WS在二元分类任务上的应用。我们提出了一种武器,这是一种简单但有竞争力的WS方法,用于生产培训标签而无需负标签。在10个基准数据集中,我们显示了武器在合成标签的质量和用这些标签监督的最终分类器的性能方面取得了最高的平均性能。我们将\方法的实现纳入了现有的基准测试平台扳手。
To create a large amount of training labels for machine learning models effectively and efficiently, researchers have turned to Weak Supervision (WS), which uses programmatic labeling sources rather than manual annotation. Existing works of WS for binary classification typically assume the presence of labeling sources that are able to assign both positive and negative labels to data in roughly balanced proportions. However, for many tasks of interest where there is a minority positive class, negative examples could be too diverse for developers to generate indicative labeling sources. Thus, in this work, we study the application of WS on binary classification tasks with positive labeling sources only. We propose WEAPO, a simple yet competitive WS method for producing training labels without negative labeling sources. On 10 benchmark datasets, we show WEAPO achieves the highest averaged performance in terms of both the quality of synthesized labels and the performance of the final classifier supervised with these labels. We incorporated the implementation of \method into WRENCH, an existing benchmarking platform.