SIGTYP 2020共享任务：类型特征的预测

论文标题

SIGTYP 2020共享任务：类型特征的预测

SIGTYP 2020 Shared Task: Prediction of Typological Features

论文作者

Bjerva, Johannes, Salesky, Elizabeth, Mielke, Sabrina J., Chaudhary, Aditi, Celano, Giuseppe G. A., Ponti, Edoardo M., Vylomova, Ekaterina, Cotterell, Ryan, Augenstein, Isabelle

论文摘要

类型学知识库（KBS），例如Wals（Dryer和Haspelmath，2013年），包含有关世界语言语言特性的信息。它们已被证明对下游应用有用，包括跨语言转移学习和语言探测。一个主要的缺点阻碍了类型学KB的更广泛采用，即它们的人口稀疏，从某种意义上说，大多数语言只有针对某些功能的注释，并且偏斜了，因为很少的功能具有广泛的覆盖范围。由于类型学特征通常相互关联，因此可以预测它们，从而自动填充类型学KB，这也是此共同任务的重点。总体而言，该任务吸引了5个团队的8个提交，其中最成功的方法利用了此类功能相关性。但是，我们的错误分析表明，即使是最强的提交系统也很难预测很少有特征的语言的特征值。

Typological knowledge bases (KBs) such as WALS (Dryer and Haspelmath, 2013) contain information about linguistic properties of the world's languages. They have been shown to be useful for downstream applications, including cross-lingual transfer learning and linguistic probing. A major drawback hampering broader adoption of typological KBs is that they are sparsely populated, in the sense that most languages only have annotations for some features, and skewed, in that few features have wide coverage. As typological features often correlate with one another, it is possible to predict them and thus automatically populate typological KBs, which is also the focus of this shared task. Overall, the task attracted 8 submissions from 5 teams, out of which the most successful methods make use of such feature correlations. However, our error analysis reveals that even the strongest submitted systems struggle with predicting feature values for languages where few features are known.

下载PDF全文

下载文献需遵守相关版权规定

论文标题