基于问题的基准答案验证方法基于问题答案的汇总评估指标

论文标题

基于问题的基准答案验证方法基于问题答案的汇总评估指标

Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics

论文作者

Deutsch, Daniel, Roth, Dan

论文摘要

基于问题答案的总结评估指标必须自动确定质量检查模型的预测是否正确，这是一种称为答案验证的任务。在这项工作中，我们基于当前基于QA的指标以及两种更复杂的文本比较方法Bertscore和LERC使用的词汇答案验证方法。我们发现，LERC在某些设置中表现出色，同时与其他方法保持统计学上没有区别。但是，我们的实验表明，改进的验证性能并不一定转化为基于质量质量质量检查的总体质量：在某些情况下，使用较差的验证方法（或根本不使用），与使用最佳验证方法具有可比的性能，这是我们将我们归因于数据集属性的结果。

Question answering-based summarization evaluation metrics must automatically determine whether the QA model's prediction is correct or not, a task known as answer verification. In this work, we benchmark the lexical answer verification methods which have been used by current QA-based metrics as well as two more sophisticated text comparison methods, BERTScore and LERC. We find that LERC out-performs the other methods in some settings while remaining statistically indistinguishable from lexical overlap in others. However, our experiments reveal that improved verification performance does not necessarily translate to overall QA-based metric quality: In some scenarios, using a worse verification method -- or using none at all -- has comparable performance to using the best verification method, a result that we attribute to properties of the datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题