论文标题

迅速引导一个问题回答Covid-19的数据集

Rapidly Bootstrapping a Question Answering Dataset for COVID-19

论文作者

Tang, Raphael, Nogueira, Rodrigo, Zhang, Edwin, Gupta, Nikhil, Cam, Phuong, Cho, Kyunghyun, Lin, Jimmy

论文摘要

我们介绍了Covidqa,这是一个问题的开始,该问题回答了专为Covid-19设计的数据集,该数据集是由Kaggle的Covid-19 Open Research Date Datat Challenge挑战的手工构建的。据我们所知,这是其类型的第一个公开可用资源,旨在作为指导研究的定格措施,直到更实质性的评估资源可用为止。虽然该数据集(包括目前版本0.1版本的124个问题对象对)没有足够的示例来进行监督机器学习,但我们认为,这对于评估与COVID-19特别相关的主题的现有模型的零射击或传输功能可能会有所帮助。本文介绍了我们用于构建数据集的方法,并介绍了许多基准的有效性,包括基于术语的技术和各种基于变压器的模型。该数据集可在http://covidqa.ai/上找到

We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge. To our knowledge, this is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available. While this dataset, comprising 124 question-article pairs as of the present version 0.1 release, does not have sufficient examples for supervised machine learning, we believe that it can be helpful for evaluating the zero-shot or transfer capabilities of existing models on topics specifically related to COVID-19. This paper describes our methodology for constructing the dataset and presents the effectiveness of a number of baselines, including term-based techniques and various transformer-based models. The dataset is available at http://covidqa.ai/

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源