迅速引导一个问题回答Covid-19的数据集

论文标题

迅速引导一个问题回答Covid-19的数据集

Rapidly Bootstrapping a Question Answering Dataset for COVID-19

论文作者

Tang, Raphael, Nogueira, Rodrigo, Zhang, Edwin, Gupta, Nikhil, Cam, Phuong, Cho, Kyunghyun, Lin, Jimmy

论文摘要

我们介绍了Covidqa，这是一个问题的开始，该问题回答了专为Covid-19设计的数据集，该数据集是由Kaggle的Covid-19 Open Research Date Datat Challenge挑战的手工构建的。据我们所知，这是其类型的第一个公开可用资源，旨在作为指导研究的定格措施，直到更实质性的评估资源可用为止。虽然该数据集（包括目前版本0.1版本的124个问题对象对）没有足够的示例来进行监督机器学习，但我们认为，这对于评估与COVID-19特别相关的主题的现有模型的零射击或传输功能可能会有所帮助。本文介绍了我们用于构建数据集的方法，并介绍了许多基准的有效性，包括基于术语的技术和各种基于变压器的模型。该数据集可在http://covidqa.ai/上找到

We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge. To our knowledge, this is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available. While this dataset, comprising 124 question-article pairs as of the present version 0.1 release, does not have sufficient examples for supervised machine learning, we believe that it can be helpful for evaluating the zero-shot or transfer capabilities of existing models on topics specifically related to COVID-19. This paper describes our methodology for constructing the dataset and presents the effectiveness of a number of baselines, including term-based techniques and various transformer-based models. The dataset is available at http://covidqa.ai/

下载PDF全文

下载文献需遵守相关版权规定

论文标题