论文标题
关于冗余与本地连贯性之间的权衡
On the Trade-off between Redundancy and Local Coherence in Summarization
论文作者
论文摘要
提取性摘要通常作为句子列表表示,如果不考虑,则没有预期的凝聚力,并且有大量冗余信息。在本文中,我们调查了旨在控制提取的摘要中索引间的凝聚力和冗余性的折衷,及其对信息的影响。作为案例研究,我们专注于长期,高度多余的文档的汇总,并考虑两个优化方案,奖励指导,没有监督。在奖励指导的情况下,我们比较控制句子评分期间冗余和凝聚力的系统。在无监督的情况下,我们介绍了两个系统,旨在以原则上的方式控制所有三个属性 - 信息性,冗余性和凝聚力。这两个系统都实施了一种心理学理论,该理论模拟了人类如何跟踪相关内容单元,以及在阅读过程中如何在短期内存中应用凝聚力和非差额约束。广泛的自动和人类评估表明,与仅针对冗余的系统相比,在摘要中优化的系统优化 - 在其他属性中,可以更好地组织摘要中的内容,同时保持可比较的信息性。我们发现,提议的无监督系统设法在不同级别的文档冗余范围内提取了高度凝聚力的摘要,尽管牺牲了此过程中的信息。最后,我们为模拟认知过程如何影响分析的摘要属性之间的权衡。
Extractive summaries are usually presented as lists of sentences with no expected cohesion between them and with plenty of redundant information if not accounted for. In this paper, we investigate the trade-offs incurred when aiming to control for inter-sentential cohesion and redundancy in extracted summaries, and their impact on their informativeness. As case study, we focus on the summarization of long, highly redundant documents and consider two optimization scenarios, reward-guided and with no supervision. In the reward-guided scenario, we compare systems that control for redundancy and cohesion during sentence scoring. In the unsupervised scenario, we introduce two systems that aim to control all three properties -- informativeness, redundancy, and cohesion -- in a principled way. Both systems implement a psycholinguistic theory that simulates how humans keep track of relevant content units and how cohesion and non-redundancy constraints are applied in short-term memory during reading. Extensive automatic and human evaluations reveal that systems optimizing for -- among other properties -- cohesion are capable of better organizing content in summaries compared to systems that optimize only for redundancy, while maintaining comparable informativeness. We find that the proposed unsupervised systems manage to extract highly cohesive summaries across varying levels of document redundancy, although sacrificing informativeness in the process. Finally, we lay evidence as to how simulated cognitive processes impact the trade-off between the analyzed summary properties.