目标很重要吗？比较代词解决方案的培训目标

论文标题

目标很重要吗？比较代词解决方案的培训目标

Does the Objective Matter? Comparing Training Objectives for Pronoun Resolution

论文作者

Yordanov, Yordan, Camburu, Oana-Maria, Kocijan, Vid, Lukasiewicz, Thomas

论文摘要

代词解决方案的硬案例已被用作常识性推理的长期基准。在最近的文献中，已经使用预训练的语言模型来获得代词分辨率的最新结果。总体而言，已经引入了四类培训和评估目标。这些作品中使用的各种培训数据集和预训练的语言模型尚不清楚培训目标的选择是否至关重要。在这项工作中，我们对代表四类目标的四个模型的性能和种子稳定性进行了公平的比较。我们的实验表明，序列排名的目的是最佳的内域，而候选人和代词之间语义相似性的目的是最佳的外域。我们还使用序列排名观察模型的种子不稳定性，当使用其他目标时，情况并非如此。

Hard cases of pronoun resolution have been used as a long-standing benchmark for commonsense reasoning. In the recent literature, pre-trained language models have been used to obtain state-of-the-art results on pronoun resolution. Overall, four categories of training and evaluation objectives have been introduced. The variety of training datasets and pre-trained language models used in these works makes it unclear whether the choice of training objective is critical. In this work, we make a fair comparison of the performance and seed-wise stability of four models that represent the four categories of objectives. Our experiments show that the objective of sequence ranking performs the best in-domain, while the objective of semantic similarity between candidates and pronoun performs the best out-of-domain. We also observe a seed-wise instability of the model using sequence ranking, which is not the case when the other objectives are used.

下载PDF全文

下载文献需遵守相关版权规定

论文标题