论文标题

了解摘要中的事实错误:错误,摘要,数据集,错误检测器

Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

论文作者

Tang, Liyan, Goyal, Tanya, Fabbri, Alexander R., Laban, Philippe, Xu, Jiacheng, Yavuz, Semih, Kryściński, Wojciech, Rousseau, Justin F., Durrett, Greg

论文摘要

已经对抽象性摘要模型的倾向进行了广泛的研究,包括设计指标以检测当前系统输出中错误的错误和误差的注释。但是,汇总系统,指标和注释的基准的不断发展的性质使事实评估成为移动的目标,并且在指标之间进行明确的比较变得越来越困难。在这项工作中,我们汇总了九个现有数据集中的事实错误注释,并根据基础摘要模型对其进行分层。我们将最先进的事实指标的性能(包括最近的基于CHATGPT的指标)进行了比较,并在此分层的基准上进行了比较,并表明它们的性能在不同类型的摘要模型中有很大变化。至关重要的是,我们的分析表明,事实检测空间的最新改进都在旧的(转换前)模型的摘要上,而不是更相关的最近摘要模型。我们进一步执行每个错误类型的细粒分析,并在不同的事实指标上找到跨误差类型的相似性能差异。我们的结果表明,在所有设置或所有错误类型中,没有一个指标都优越,并且我们为这些见解提供了建议。

The propensity of abstractive summarization models to make factual errors has been studied extensively, including design of metrics to detect factual errors and annotation of errors in current systems' outputs. However, the ever-evolving nature of summarization systems, metrics, and annotated benchmarks makes factuality evaluation a moving target, and drawing clear comparisons among metrics has become increasingly difficult. In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model. We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models. Critically, our analysis shows that much of the recent improvement in the factuality detection space has been on summaries from older (pre-Transformer) models instead of more relevant recent summarization models. We further perform a finer-grained analysis per error-type and find similar performance variance across error types for different factuality metrics. Our results show that no one metric is superior in all settings or for all error types, and we provide recommendations for best practices given these insights.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源