论文标题

优先考虑平衡图分区的限制算法

Prioritized Restreaming Algorithms for Balanced Graph Partitioning

论文作者

Awadelkarim, Amel, Ugander, Johan

论文摘要

平衡图分区是许多具有关系数据的大规模分布式计算的关键步骤。随着图形数据集的大小和密度的增长,一系列高度估计的平衡分区算法似乎已经满足了不同域之间各种需求。作为当前工作的起点,我们观察到,最近介绍了两个迭代分区的家族 - 基于限制性的家族以及基于平衡的标签传播的家族(包括Facebook的社交哈希分区者) - 可以通过常见的模块化设计决策框架来查看。在这种模块化的角度的帮助下,我们发现设计决策的关键组合导致了一种新型算法系列,其经验性能比任何现有的高度可观算法在广泛的现实图表上都更好。由此产生的优先级限制算法采用基于乘法权重的约束管理策略,从限制文献中借来,同时采用平衡标签传播中的优先级概念来优化流流过程的排序。我们的实验结果考虑了一系列的流订单,在所谓的平衡分区的削减质量方面,基于我们所谓的矛盾性的动态顺序广泛是最具性能的,其基于程度的静态排序几乎同样好。

Balanced graph partitioning is a critical step for many large-scale distributed computations with relational data. As graph datasets have grown in size and density, a range of highly-scalable balanced partitioning algorithms have appeared to meet varied demands across different domains. As the starting point for the present work, we observe that two recently introduced families of iterative partitioners---those based on restreaming and those based on balanced label propagation (including Facebook's Social Hash Partitioner)---can be viewed through a common modular framework of design decisions. With the help of this modular perspective, we find that a key combination of design decisions leads to a novel family of algorithms with notably better empirical performance than any existing highly-scalable algorithm on a broad range of real-world graphs. The resulting prioritized restreaming algorithms employ a constraint management strategy based on multiplicative weights, borrowed from the restreaming literature, while adopting notions of priority from balanced label propagation to optimize the ordering of the streaming process. Our experimental results consider a range of stream orders, where a dynamic ordering based on what we call ambivalence is broadly the most performative in terms of the cut quality of the resulting balanced partitions, with a static ordering based on degree being nearly as good.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源