论文标题

WIDAR-加权输入文档增强胭脂

WIDAR -- Weighted Input Document Augmented ROUGE

论文作者

Jain, Raghav, Mavi, Vaibhav, Jangra, Anubhav, Saha, Sriparna

论文摘要

由于机器学习技术的最新进步,自动文本摘要的任务已获得了很多吸引力。但是,评估生成的摘要的质量仍然是一个开放的问题。这些文献已广泛采用了以召回率的研究来关注评估(Rouge)作为摘要的标准评估指标。但是,胭脂有一些悠久的局限性。主要的是它依赖于良好质量参考摘要的依赖。在这项工作中,我们提出了公制的widar,除了利用参考摘要用途外,还要使用输入文档来评估生成的摘要的质量。拟议的度量是用途广泛的,因为它旨在根据参考摘要的质量调整评估评分。提出的度量与鲁格的相关性高26%,76%,82%和15%,分别以连贯性,一致性,流利性和相关性,以及与萨默词数据集中提供的人类判断分数相关性。所提出的指标能够与其他最先进的指标获得可比的结果,同时需要相对较短的计算时间。

The task of automatic text summarization has gained a lot of traction due to the recent advancements in machine learning techniques. However, evaluating the quality of a generated summary remains to be an open problem. The literature has widely adopted Recall-Oriented Understudy for Gisting Evaluation (ROUGE) as the standard evaluation metric for summarization. However, ROUGE has some long-established limitations; a major one being its dependence on the availability of good quality reference summary. In this work, we propose the metric WIDAR which in addition to utilizing the reference summary uses also the input document in order to evaluate the quality of the generated summary. The proposed metric is versatile, since it is designed to adapt the evaluation score according to the quality of the reference summary. The proposed metric correlates better than ROUGE by 26%, 76%, 82%, and 15%, respectively, in coherence, consistency, fluency, and relevance on human judgement scores provided in the SummEval dataset. The proposed metric is able to obtain comparable results with other state-of-the-art metrics while requiring a relatively short computational time.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源