论文标题
剥夺多源对神经文本分类的弱监督
Denoising Multi-Source Weak Supervision for Neural Text Classification
论文作者
论文摘要
我们研究了学习神经文本分类器的问题,而无需使用任何标记的数据,而仅作为多个弱监督来源的易于证实的规则。这个问题是具有挑战性的,因为规则引起的弱标签通常是嘈杂的和不完整的。为了应对这两个挑战,我们设计了标签Denoiser,该标签使用有条件的软注意机制估算源可靠性,然后通过汇总规则声明的弱标签来降低标签噪声。然后,deno的伪标签监督神经分类器,以预测无与伦比的样品的软标签,以解决规则覆盖问题。我们在五个基准测试基准上评估了我们的模型,以进行情感,主题和关系分类。结果表明,我们的模型优于最先进的弱监督和半监督方法,即使没有任何标记的数据,也可以使用完全监督的方法来实现可比较的性能。我们的代码可以在https://github.com/weakrules/denoise-multi-weak-sources上找到。
We study the problem of learning neural text classifiers without using any labeled data, but only easy-to-provide rules as multiple weak supervision sources. This problem is challenging because rule-induced weak labels are often noisy and incomplete. To address these two challenges, we design a label denoiser, which estimates the source reliability using a conditional soft attention mechanism and then reduces label noise by aggregating rule-annotated weak labels. The denoised pseudo labels then supervise a neural classifier to predicts soft labels for unmatched samples, which address the rule coverage issue. We evaluate our model on five benchmarks for sentiment, topic, and relation classifications. The results show that our model outperforms state-of-the-art weakly-supervised and semi-supervised methods consistently, and achieves comparable performance with fully-supervised methods even without any labeled data. Our code can be found at https://github.com/weakrules/Denoise-multi-weak-sources.