论文标题

Unitrans:统一的模型传输和数据传输,用于使用未标记的数据的跨语义命名实体识别

UniTrans: Unifying Model Transfer and Data Transfer for Cross-Lingual Named Entity Recognition with Unlabeled Data

论文作者

Wu, Qianhui, Lin, Zijia, Karlsson, Börje F., Huang, Biqing, Lou, Jian-Guang

论文摘要

跨语言命名实体识别(NER)的先前作品,没有/标记的数据少于两个主要类别:基于模型传输和基于数据传输的方法。在本文中,我们发现两种方法都可以相互补充,从某种意义上说,前者可以通过与语言无关的功能利用上下文信息,但没有看到目标语言中的特定于任务的信息。尽管后者通常通过翻译生成伪目标培训数据,但由于不准确的翻译而削弱了对上下文信息的开发。此外,先前的工作很少用目标语言利用未标记的数据,这些数据可以毫不费力地收集,并可能包含有价值的信息以改善结果。为了解决这两个问题,我们提出了一种称为Unitrans的新型方法,以统一跨语言NER的模型和数据传输,并通过增强的知识蒸馏来利用未标记的目标语言数据中的可用信息。我们在基准数据集上评估了超过4种目标语言的拟议的Unitrans。我们的实验结果表明,它基本上优于现有的最新方法。

Prior works in cross-lingual named entity recognition (NER) with no/little labeled data fall into two primary categories: model transfer based and data transfer based methods. In this paper we find that both method types can complement each other, in the sense that, the former can exploit context information via language-independent features but sees no task-specific information in the target language; while the latter generally generates pseudo target-language training data via translation but its exploitation of context information is weakened by inaccurate translations. Moreover, prior works rarely leverage unlabeled data in the target language, which can be effortlessly collected and potentially contains valuable information for improved results. To handle both problems, we propose a novel approach termed UniTrans to Unify both model and data Transfer for cross-lingual NER, and furthermore, to leverage the available information from unlabeled target-language data via enhanced knowledge distillation. We evaluate our proposed UniTrans over 4 target languages on benchmark datasets. Our experimental results show that it substantially outperforms the existing state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源