Stad：使用模棱两可的数据进行自我训练，以进行低资源关系提取

论文标题

Stad：使用模棱两可的数据进行自我训练，以进行低资源关系提取

STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction

论文作者

Yu, Junjie, Wang, Xing, Zhao, Jiangjiang, Yang, Chunjie, Chen, Wenliang

论文摘要

我们提出了一种简单而有效的自我训练方法，称为Stad，用于低资源关系提取。该方法首先根据教师模型所预测的概率将自动注释的实例分为两组：自信实例和不确定实例。与大多数以前的研究相反，这主要仅利用自信实例进行自我训练，我们利用了不确定的实例。为此，我们提出了一种从不确定实例中识别模棱两可但有用的实例的方法，然后将关系分为每个模棱两可的实例中的候选标签集和负标签集。接下来，我们为模棱两可的实例的负标签集提出了一种设定的培训方法，并针对自信实例进行了积极的培训方法。最后，提出了一种联合培训方法来在所有数据上构建最终关系提取系统。在两个广泛使用的数据集SEMEVAL2010 TASK-8上进行的实验结果并重新鉴定了低资源设置，这表明，与几种竞争性自我训练系统相比，这种新的自我训练方法确实可以实现显着且一致的改进。代码可在https://github.com/jjyunlp/stad上公开获取

We present a simple yet effective self-training approach, named as STAD, for low-resource relation extraction. The approach first classifies the auto-annotated instances into two groups: confident instances and uncertain instances, according to the probabilities predicted by a teacher model. In contrast to most previous studies, which mainly only use the confident instances for self-training, we make use of the uncertain instances. To this end, we propose a method to identify ambiguous but useful instances from the uncertain instances and then divide the relations into candidate-label set and negative-label set for each ambiguous instance. Next, we propose a set-negative training method on the negative-label sets for the ambiguous instances and a positive training method for the confident instances. Finally, a joint-training method is proposed to build the final relation extraction system on all data. Experimental results on two widely used datasets SemEval2010 Task-8 and Re-TACRED with low-resource settings demonstrate that this new self-training approach indeed achieves significant and consistent improvements when comparing to several competitive self-training systems. Code is publicly available at https://github.com/jjyunlp/STAD

下载PDF全文

下载文献需遵守相关版权规定

论文标题