Farstail：波斯自然语言推理数据集

论文标题

Farstail：波斯自然语言推理数据集

FarsTail: A Persian Natural Language Inference Dataset

论文作者

Amirkhani, Hossein, AzariJafari, Mohammad, Pourjafari, Zohreh, Faridan-Jahromi, Soroush, Kouhkan, Zeinab, Amirak, Azadeh

论文摘要

自然语言推论（NLI）被称为自然语言处理（NLP）的核心任务之一，它封装了语言理解的许多基本方面。通过NLP任务中渴望数据深度学习方法的相当大的成就，已经大量精力致力于为不同语言开发更多样化的数据集。在本文中，我们为波斯语（也称为Farsi）提供了一个针对NLI任务的新数据集，该数据集是中东的主要语言之一。该数据集名为Farstail，包括10,367个样本，这些样本均以波斯语以及索引格式提供，可用于非塞亚研究人员。这些样品是从3,539个多项选择问题中生成的，其注释量最少的方式与Scitail数据集类似。采用精心设计的多步骤过程，以确保数据集的质量。我们还介绍了传统和最先进的方法的远方方法，包括不同的嵌入方法，例如Word2Vec，FastText，Elmo，Bert和Laser，以及不同的建模方法，例如Depompatt，ESIM，HBMP和ULMFIT，以为未来研究提供可靠的基线。最佳获得的测试准确性是83.38％，这表明有一个很大的空间可以改善当前方法，可用于不同语言的现实世界NLP应用程序。我们还研究了模型在farstail中利用表面偏见的表面线索的程度，并根据有偏见的模型的成功将测试设置分为简单而硬的子集。该数据集可从https://github.com/dml-qom/farstail获得

Natural language inference (NLI) is known as one of the central tasks in natural language processing (NLP) which encapsulates many fundamental aspects of language understanding. With the considerable achievements of data-hungry deep learning methods in NLP tasks, a great amount of effort has been devoted to develop more diverse datasets for different languages. In this paper, we present a new dataset for the NLI task in the Persian language, also known as Farsi, which is one of the dominant languages in the Middle East. This dataset, named FarsTail, includes 10,367 samples which are provided in both the Persian language as well as the indexed format to be useful for non-Persian researchers. The samples are generated from 3,539 multiple-choice questions with the least amount of annotator interventions in a way similar to the SciTail dataset. A carefully designed multi-step process is adopted to ensure the quality of the dataset. We also present the results of traditional and state-of-the-art methods on FarsTail including different embedding methods such as word2vec, fastText, ELMo, BERT, and LASER, as well as different modeling approaches such as DecompAtt, ESIM, HBMP, and ULMFiT to provide a solid baseline for the future research. The best obtained test accuracy is 83.38% which shows that there is a big room for improving the current methods to be useful for real-world NLP applications in different languages. We also investigate the extent to which the models exploit superficial clues, also known as dataset biases, in FarsTail, and partition the test set into easy and hard subsets according to the success of biased models. The dataset is available at https://github.com/dml-qom/FarsTail

下载PDF全文

下载文献需遵守相关版权规定

论文标题