对Winograd式任务的数据集重叠的分析

论文标题

对Winograd式任务的数据集重叠的分析

An Analysis of Dataset Overlap on Winograd-Style Tasks

论文作者

Emami, Ali, Trischler, Adam, Suleman, Kaheer, Cheung, Jackie Chi Kit

论文摘要

Winograd模式挑战（WSC）和受其启发的变体已成为常识性推理（CSR）的重要基准。 WSC上的模型性能很快就使用了接受大规模语料库训练的神经语言模型从机会级到近人类的发展。在本文中，我们分析了这些培训语料库与WSC风格任务中的测试实例之间不同程度重叠的影响。我们发现，大量的测试实例与对最先进模型的（预先）训练的语料库有很大的重叠，并且当我们评估模型对最小重叠的实例进行评估时，分类准确性的显着下降。基于这些结果，我们开发了KnowREF-60K数据集，该数据集由超过60k的代词放弃歧义问题组成。 KnowRef-60k是迄今为止WSC风格常识性推理的最大语料库，并且与当前的预读语料库的重叠比例明显较低。

The Winograd Schema Challenge (WSC) and variants inspired by it have become important benchmarks for common-sense reasoning (CSR). Model performance on the WSC has quickly progressed from chance-level to near-human using neural language models trained on massive corpora. In this paper, we analyze the effects of varying degrees of overlap between these training corpora and the test instances in WSC-style tasks. We find that a large number of test instances overlap considerably with the corpora on which state-of-the-art models are (pre)trained, and that a significant drop in classification accuracy occurs when we evaluate models on instances with minimal overlap. Based on these results, we develop the KnowRef-60K dataset, which consists of over 60k pronoun disambiguation problems scraped from web data. KnowRef-60K is the largest corpus to date for WSC-style common-sense reasoning and exhibits a significantly lower proportion of overlaps with current pretraining corpora.

下载PDF全文

下载文献需遵守相关版权规定

论文标题