实时QA：现在的答案是什么？

论文标题

实时QA：现在的答案是什么？

RealTime QA: What's the Answer Right Now?

论文作者

Kasai, Jungo, Sakaguchi, Keisuke, Takahashi, Yoichi, Bras, Ronan Le, Asai, Akari, Yu, Xinyan, Radev, Dragomir, Smith, Noah A., Choi, Yejin, Inui, Kentaro

论文摘要

我们介绍了Realtime QA，这是一个动态的问答（QA）平台，该平台会定期宣布问题并评估系统（此版本每周）。实时质量检查询问当前世界，质量保证系统需要回答有关新事件或信息的问题。因此，它挑战了开放域QA数据集中的静态，常规假设，并追求瞬时应用程序。我们在包括GPT-3和T5在内的大型语言模型上建立了强大的基线模型。我们的基准是一项持续的努力，本文在过去一年中提出了实时评估结果。我们的实验结果表明，GPT-3通常可以根据新的退货文档正确更新其生成结果，从而强调了最新信息检索的重要性。尽管如此，我们发现GPT-3倾向于在检索文档时返回过时的答案，但没有提供足够的信息来找到答案。这表明了未来研究的重要途径：开放域质量检查系统可以确定这种无法回答的案例，并与用户甚至检索模块进行通信以修改检索结果吗？我们希望实时质量检查能够刺激问题回答及其他问题的瞬时应用。

We introduce REALTIME QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). REALTIME QA inquires about the current world, and QA systems need to answer questions about novel events or information. It therefore challenges static, conventional assumptions in open-domain QA datasets and pursues instantaneous applications. We build strong baseline models upon large pretrained language models, including GPT-3 and T5. Our benchmark is an ongoing effort, and this paper presents real-time evaluation results over the past year. Our experimental results show that GPT-3 can often properly update its generation results, based on newly-retrieved documents, highlighting the importance of up-to-date information retrieval. Nonetheless, we find that GPT-3 tends to return outdated answers when retrieved documents do not provide sufficient information to find an answer. This suggests an important avenue for future research: can an open-domain QA system identify such unanswerable cases and communicate with the user or even the retrieval module to modify the retrieval results? We hope that REALTIME QA will spur progress in instantaneous applications of question answering and beyond.

下载PDF全文

下载文献需遵守相关版权规定

论文标题