论文标题
结合自训练和自我监督的学习,以进行无监督的差异检测
Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection
论文作者
论文摘要
大多数现有的探索方法都在很大程度上依赖于人类通知的语料库,这在实践中很昂贵。有几项建议减轻此问题的建议,例如自我监督的学习技术,但它们仍然需要人类宣传的语料库。在这项工作中,我们探索了无监督的学习范式,该范式可能与无标记的文本语料库一起使用,这些文字更便宜,更易于获得。我们的模型基于最近关于嘈杂的学生培训的工作,这是一种半监督的学习方法,扩展了自我训练的想法。常用英语总机测试集的实验结果表明,与先前使用上下文化的单词嵌入(例如Bert和Electra)相比,我们的方法与先前的最新监督系统相比,实现了竞争性能。
Most existing approaches to disfluency detection heavily rely on human-annotated corpora, which is expensive to obtain in practice. There have been several proposals to alleviate this issue with, for instance, self-supervised learning techniques, but they still require human-annotated corpora. In this work, we explore the unsupervised learning paradigm which can potentially work with unlabeled text corpora that are cheaper and easier to obtain. Our model builds upon the recent work on Noisy Student Training, a semi-supervised learning approach that extends the idea of self-training. Experimental results on the commonly used English Switchboard test set show that our approach achieves competitive performance compared to the previous state-of-the-art supervised systems using contextualized word embeddings (e.g. BERT and ELECTRA).