Cluener2020：中文的细粒度命名实体识别数据集和基准

论文标题

Cluener2020：中文的细粒度命名实体识别数据集和基准

CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese

论文作者

Xu, Liang, tong, Yu, Dong, Qianqian, Liao, Yixuan, Yu, Cong, Tian, Yin, Liu, Weitang, Li, Lu, Liu, Caiquan, Zhang, Xuanwei

论文摘要

在本文中，我们介绍了Culagrancy（Cluener2020）的NER数据集，这是一个定义明确的细粒数据集，用于中文命名的实体识别。 Cluener2020包含10个类别。除了人，组织和位置之类的常见标签外，它还包含更多不同的类别。它比当前的其他中国NER数据集更具挑战性，并且可以更好地反映现实世界中的应用程序。为了进行比较，我们将几个最先进的基线作为序列标记任务实施，并报告人类绩效及其分析。为了促进对中文的细粒度NER的未来工作，我们发布了我们的数据集，基准和领导者板。

In this paper, we introduce the NER dataset from CLUE organization (CLUENER2020), a well-defined fine-grained dataset for named entity recognition in Chinese. CLUENER2020 contains 10 categories. Apart from common labels like person, organization, and location, it contains more diverse categories. It is more challenging than current other Chinese NER datasets and could better reflect real-world applications. For comparison, we implement several state-of-the-art baselines as sequence labeling tasks and report human performance, as well as its analysis. To facilitate future work on fine-grained NER for Chinese, we release our dataset, baselines, and leader-board.

下载PDF全文

下载文献需遵守相关版权规定

论文标题