论文标题
机器阅读理解的调查:任务,评估指标和基准数据集
A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and Benchmark Datasets
论文作者
论文摘要
机器阅读理解(MRC)是具有广泛的现实应用程序的挑战性自然语言处理(NLP)研究领域。近年来,该领域的巨大进步主要是由于大规模数据集的出现和深度学习的出现。目前,尽管现有MRC模型和真正的人级阅读理解之间存在明显的巨大差距,但许多MRC模型已经超过了各种基准数据集的人类绩效。这表明需要改善现有数据集,评估指标和模型,以将当前的MRC模型转移到“真实”的理解中。为了解决当前对现有MRC任务,评估指标和数据集的全面调查,此处(1)我们分析了57个MRC任务和数据集,并提出了具有4个不同属性的MRC任务的更精确的分类方法; (2)我们总结了MRC任务,7个属性和10个MRC数据集特征的9个评估指标; (3)我们还讨论了MRC研究中的关键开放问题,并强调了未来的研究方向。此外,我们已经在同伴网站(https://mrc-datasets.github.io/)上收集,组织和发布了我们的数据,其中MRC研究人员可以直接访问每个MRC数据集,论文,基线项目和排行榜。
Machine Reading Comprehension (MRC) is a challenging Natural Language Processing(NLP) research field with wide real-world applications. The great progress of this field in recent years is mainly due to the emergence of large-scale datasets and deep learning. At present, a lot of MRC models have already surpassed human performance on various benchmark datasets despite the obvious giant gap between existing MRC models and genuine human-level reading comprehension. This shows the need for improving existing datasets, evaluation metrics, and models to move current MRC models toward "real" understanding. To address the current lack of comprehensive survey of existing MRC tasks, evaluation metrics, and datasets, herein, (1) we analyze 57 MRC tasks and datasets and propose a more precise classification method of MRC tasks with 4 different attributes; (2) we summarized 9 evaluation metrics of MRC tasks, 7 attributes and 10 characteristics of MRC datasets; (3) We also discuss key open issues in MRC research and highlighted future research directions. In addition, we have collected, organized, and published our data on the companion website(https://mrc-datasets.github.io/) where MRC researchers could directly access each MRC dataset, papers, baseline projects, and the leaderboard.