无监督的多粒性摘要

论文标题

无监督的多粒性摘要

Unsupervised Multi-Granularity Summarization

论文作者

Zhong, Ming, Liu, Yang, Ge, Suyu, Mao, Yuning, Jiao, Yizhu, Zhang, Xingxing, Xu, Yichong, Zhu, Chenguang, Zeng, Michael, Han, Jiawei

论文摘要

文本摘要是一个基于用户的任务，即，对于一个文档，用户通常具有摘要的优先级不同。作为摘要的自定义的关键方面，粒度用于测量摘要和源文档之间的语义覆盖范围。但是，开发可以通过可自定义的语义覆盖范围生成摘要的系统仍然是一个不足的主题。在本文中，我们提出了第一个无监督的多粒性摘要框架Granusum。我们将事件视为源文档的基本语义单元，并建议通过其显着性对这些事件进行排名。我们还开发了一个模型来总结以给定事件为锚和提示的输入文档。通过输入不同数量的事件，Granusum能够以无监督的方式产生多粒子摘要。同时，我们注释了一个新的基准颗粒，该颗粒包含每个文档群集的不同粒度上的多个摘要。实验结果证实了颗粒对多晶状体摘要的实质优势，而不是强质基础。此外，通过利用事件信息，Granusum在常规无监督的抽象环境下还表现出最先进的性能。可以在以下网址找到本文的数据集

Text summarization is a user-preference based task, i.e., for one document, users often have different priorities for summary. As a key aspect of customization in summarization, granularity is used to measure the semantic coverage between the summary and source document. However, developing systems that can generate summaries with customizable semantic coverage is still an under-explored topic. In this paper, we propose the first unsupervised multi-granularity summarization framework, GranuSum. We take events as the basic semantic units of the source documents and propose to rank these events by their salience. We also develop a model to summarize input documents with given events as anchors and hints. By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner. Meanwhile, we annotate a new benchmark GranuDUC that contains multiple summaries at different granularities for each document cluster. Experimental results confirm the substantial superiority of GranuSum on multi-granularity summarization over strong baselines. Further, by exploiting the event information, GranuSum also exhibits state-of-the-art performance under the conventional unsupervised abstractive setting. Dataset for this paper can be found at: https://github.com/maszhongming/GranuDUC

下载PDF全文

下载文献需遵守相关版权规定

论文标题