论文标题
degreesketch:带有应用程序的大图上的分布式基数草图
DegreeSketch: Distributed Cardinality Sketches on Massive Graphs with Applications
论文作者
论文摘要
我们提出了一种半流的分布式草图数据结构,并演示了其估算本地邻域大小和局部三角形的效用,并在大量图上进行了重击。学位是由以顶点为中心的基数草图组成的,分布在单个通行证中积累的一组处理器上,然后作为持续的查询引擎的行为,能够大约回答与邻接设置工会和相互作用大小有关的图形查询。顶点的$ t $ th当地社区是从$ v $中到达$ g $的顶点数量,最多可以通过$ t $ edge,而本地三角数是其中包括的3个赛车的数量。这两个指标在图形分析应用程序中都有用,但是随着图形大小的增长,精确的计算尺寸较差。我们提出了有效的算法,用于估算使用degreesketch的本地邻域大小和本地三角量重击。在我们的实验中,我们使用著名的HyperLogLog基数草图实现了Degreesketch,并利用分布式通信工具YGM在分布式内存中实现最新性能。
We present DegreeSketch, a semi-streaming distributed sketch data structure and demonstrate its utility for estimating local neighborhood sizes and local triangle count heavy hitters on massive graphs. DegreeSketch consists of vertex-centric cardinality sketches distributed across a set of processors that are accumulated in a single pass, and then behaves as a persistent query engine capable of approximately answering graph queries pertaining to the sizes of adjacency set unions and intersections. The $t$th local neighborhood of a vertex is the number of vertices reachable in $G$ from $v$ by traversing at most $t$ edges, whereas the local triangle count is the number of 3-cycles in which it is included. Both metrics are useful in graph analysis applications, but exact computations scale poorly as graph sizes grow. We present efficient algorithms for estimating both local neighborhood sizes and local triangle count heavy hitters using DegreeSketch. In our experiments we implement DegreeSketch using the celebrated hyperloglog cardinality sketch and utilize the distributed communication tool YGM to achieve state-of-the-art performance in distributed memory.