将内容和预算决策分解为抽象性摘要的长文档

论文标题

将内容和预算决策分解为抽象性摘要的长文档

Factorizing Content and Budget Decisions in Abstractive Summarization of Long Documents

论文作者

Fonseca, Marcio, Ziser, Yftah, Cohen, Shay B.

论文摘要

我们认为，从用于涵盖显着内容的预算中删除内容选择可以提高抽象性摘要的性能和适用性。我们的方法，因素，通过将汇总分解为两个步骤，通过能量函数将这种分解来进行：（1）产生抽象性摘要观点；（2）根据预算和内容指导，将这些观点组合为最终摘要。该指南可能来自不同的来源，包括来自顾问模型，例如Bart或Bigbird或Oracle模式 - 来自参考。对于长文档摘要，即PubMed，Arxiv和Govreport，这种分解为多个基准的胭脂得分明显更高。最值得注意的是，我们的模型对域适应有效。仅在PubMed样品上接受培训时，它在ARXIV上获得了46.29 Rouge-1的得分，这表明由于预算的适应性更高，并且内容选择较少，因此较不太依赖特定领域的文本结构。

We argue that disentangling content selection from the budget used to cover salient content improves the performance and applicability of abstractive summarizers. Our method, FactorSum, does this disentanglement by factorizing summarization into two steps through an energy function: (1) generation of abstractive summary views; (2) combination of these views into a final summary, following a budget and content guidance. This guidance may come from different sources, including from an advisor model such as BART or BigBird, or in oracle mode -- from the reference. This factorization achieves significantly higher ROUGE scores on multiple benchmarks for long document summarization, namely PubMed, arXiv, and GovReport. Most notably, our model is effective for domain adaptation. When trained only on PubMed samples, it achieves a 46.29 ROUGE-1 score on arXiv, which indicates a strong performance due to more flexible budget adaptation and content selection less dependent on domain-specific textual structure.

下载PDF全文

下载文献需遵守相关版权规定

论文标题