论文标题
SSUMM:大量图的稀疏总结
SSumM: Sparse Summarization of Massive Graphs
论文作者
论文摘要
给定图G和所需的尺寸k位,我们如何在k位内总结G,同时最大程度地减少信息丢失? 大规模图已成为无处不在的,带来了巨大的计算挑战。如果对如此大的图形进行分析,则可以轻松轻松地压缩以适合主内存甚至缓存。图形摘要在图形压缩技术之间产生了带有合并节点的粗粒摘要图,在图形压缩技术之间脱颖而出。因此,已经开发了许多算法,以获取简明的摘要图,而信息丢失很少或相当小的重建误差。但是,现有方法仅着眼于减少节点的数量,并且它们通常会产生密集的摘要图,从而无法实现更好的压缩率。此外,由于其有限的可伸缩性,它们只能应用于中等大小的图。 在这项工作中,我们提出了SSUMM,SSUMM是一种可扩展的图形符号化算法,可产生稀疏的摘要图。 Ssumm不仅将节点合并在一起,而且还散布了摘要图,并且两种策略是根据最小描述长度原理仔细平衡的。与最先进的竞争对手相比,SSUMM是(a)简洁:最高可产生11.2倍较小的摘要图,具有相似的重建误差,(b)准确:实现多达4.2倍的较小的重建误差,具有类似简洁的输出,并且(c)可缩放:摘要26x较大的图形,同时展示线性scale scaleal scalear scalear clinear scalear clinear clinear scalearsibal scalear clinear clinear clinear clinear scalear clinear scalear scaplyability。我们通过在10个现实世界图上进行广泛的实验来验证这些优势。
Given a graph G and the desired size k in bits, how can we summarize G within k bits, while minimizing the information loss? Large-scale graphs have become omnipresent, posing considerable computational challenges. Analyzing such large graphs can be fast and easy if they are compressed sufficiently to fit in main memory or even cache. Graph summarization, which yields a coarse-grained summary graph with merged nodes, stands out with several advantages among graph compression techniques. Thus, a number of algorithms have been developed for obtaining a concise summary graph with little information loss or equivalently small reconstruction error. However, the existing methods focus solely on reducing the number of nodes, and they often yield dense summary graphs, failing to achieve better compression rates. Moreover, due to their limited scalability, they can be applied only to moderate-size graphs. In this work, we propose SSumM, a scalable and effective graph-summarization algorithm that yields a sparse summary graph. SSumM not only merges nodes together but also sparsifies the summary graph, and the two strategies are carefully balanced based on the minimum description length principle. Compared with state-of-the-art competitors, SSumM is (a) Concise: yields up to 11.2X smaller summary graphs with similar reconstruction error, (b) Accurate: achieves up to 4.2X smaller reconstruction error with similarly concise outputs, and (c) Scalable: summarizes 26X larger graphs while exhibiting linear scalability. We validate these advantages through extensive experiments on 10 real-world graphs.