Semeval-2020任务4：常识验证和解释

论文标题

Semeval-2020任务4：常识验证和解释

SemEval-2020 Task 4: Commonsense Validation and Explanation

论文作者

Wang, Cunxiang, Liang, Shuailong, Jin, Yili, Wang, Yilong, Zhu, Xiaodan, Zhang, Yue

论文摘要

在本文中，我们介绍了2020年的Semeval-2020任务4，常识性验证和解释（Comve），其中包括三个子任务，旨在评估系统是否可以区分自然语言陈述，该语言对人类的意义与没有的人，并提供原因。具体而言，在我们的第一个子任务中，参与系统必须从两个自然语言语句中选择相似的措辞，该语言具有有意义的措辞，而措辞则没有。第二个子任务还要求系统从三个选项中选择关键原因，为什么给定语句没有意义。在第三个子任务中，参与系统需要产生原因。我们终于吸引了39支参加三个子任务之一的团队。对于子任务A和子任务B，顶级系统的性能接近人类。但是，对于子任务C，系统和人类绩效之间仍然存在相对较大的差距。可以在https://github.com/wangcunxiang/semeval2020- task4-commonsense-validation-and-explanation上找到我们任务中使用的数据集；可以在https://competitions.codalab.org/competitions/21080#Results上找到排行榜。

In this paper, we present SemEval-2020 Task 4, Commonsense Validation and Explanation (ComVE), which includes three subtasks, aiming to evaluate whether a system can distinguish a natural language statement that makes sense to humans from one that does not, and provide the reasons. Specifically, in our first subtask, the participating systems are required to choose from two natural language statements of similar wording the one that makes sense and the one does not. The second subtask additionally asks a system to select the key reason from three options why a given statement does not make sense. In the third subtask, a participating system needs to generate the reason. We finally attracted 39 teams participating at least one of the three subtasks. For Subtask A and Subtask B, the performances of top-ranked systems are close to that of humans. However, for Subtask C, there is still a relatively large gap between systems and human performance. The dataset used in our task can be found at https://github.com/wangcunxiang/SemEval2020- Task4-Commonsense-Validation-and-Explanation; The leaderboard can be found at https://competitions.codalab.org/competitions/21080#results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题