论文标题

债券:伯特辅助开放域名名为“实体识别”,并具有遥远的监督

BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision

论文作者

Liang, Chen, Yu, Yue, Jiang, Haoming, Er, Siawpeng, Wang, Ruijia, Zhao, Tuo, Zhang, Chao

论文摘要

我们在遥远的监督下研究了名为实体识别(NER)问题的开放域。遥远的监督虽然不需要大量的手动注释,但通过外部知识库产生高度不完整和嘈杂的遥远标签。为了应对这一挑战,我们提出了一个新的计算框架 - 债券,该框架利用了预训练的语言模型(例如Bert和Roberta)的力量来改善NER模型的预测性能。具体而言,我们提出了一种两阶段的训练算法:在第一阶段,我们使用遥远的标签将预训练的语言模型调整为NER任务,这可以显着提高回忆和精度;在第二阶段,我们放下遥远的标签,并提出一种自我训练的方法来进一步改善模型性能。在5个基准数据集上进行了彻底的实验,证明了债券比现有的遥远监督NER方法的优越性。代码和遥远的数据已在https://github.com/cliang1453/bond中发布。

We study the open-domain named entity recognition (NER) problem under distant supervision. The distant supervision, though does not require large amounts of manual annotations, yields highly incomplete and noisy distant labels via external knowledge bases. To address this challenge, we propose a new computational framework -- BOND, which leverages the power of pre-trained language models (e.g., BERT and RoBERTa) to improve the prediction performance of NER models. Specifically, we propose a two-stage training algorithm: In the first stage, we adapt the pre-trained language model to the NER tasks using the distant labels, which can significantly improve the recall and precision; In the second stage, we drop the distant labels, and propose a self-training approach to further improve the model performance. Thorough experiments on 5 benchmark datasets demonstrate the superiority of BOND over existing distantly supervised NER methods. The code and distantly labeled data have been released in https://github.com/cliang1453/BOND.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源