论文标题
土耳其依赖解析的资源:引入Boun Treebank和船注释工具
Resources for Turkish Dependency Parsing: Introducing the BOUN Treebank and the BoAT Annotation Tool
论文作者
论文摘要
在本文中,我们介绍了我们为土耳其依赖解析开发的资源,其中包括一种新颖的手动注释树库(Boun Treebank),以及我们采用的指南以及新的注释工具(船)。我们采用的手动注释过程是由四个语言学家和五名自然语言处理(NLP)专家组成的团队塑造和实施的。关于BOUN Treebank注释的决定是根据普遍的依赖(UD)框架以及我们最近通过手动重新注释统一土耳其UD Treebanks的努力做出的。据我们所知,Boun Treebank是最大的土耳其树仓。它总共包含来自各种主题的9,761个句子,包括传记文本,国家报纸,教学文本,流行文化文章和论文。此外,我们报告了在Boun Treebank获得的最先进的依赖解析器以及土耳其其他两个树仓的分析结果。我们的结果表明,土耳其注释方案的统一和更全面的树仓的引入导致在依赖解析方面的绩效提高。
In this paper, we introduce the resources that we developed for Turkish dependency parsing, which include a novel manually annotated treebank (BOUN Treebank), along with the guidelines we adopted, and a new annotation tool (BoAT). The manual annotation process we employed was shaped and implemented by a team of four linguists and five Natural Language Processing (NLP) specialists. Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies (UD) framework as well as our recent efforts for unifying the Turkish UD treebanks through manual re-annotation. To the best of our knowledge, BOUN Treebank is the largest Turkish treebank. It contains a total of 9,761 sentences from various topics including biographical texts, national newspapers, instructional texts, popular culture articles, and essays. In addition, we report the parsing results of a state-of-the-art dependency parser obtained over the BOUN Treebank as well as two other treebanks in Turkish. Our results demonstrate that the unification of the Turkish annotation scheme and the introduction of a more comprehensive treebank lead to improved performance with regard to dependency parsing.