最小化局部比率削减了超图中的目标

论文标题

最小化局部比率削减了超图中的目标

Minimizing Localized Ratio Cut Objectives in Hypergraphs

论文作者

Veldt, Nate, Benson, Austin R., Kleinberg, Jon

论文摘要

HyperGraphs是在数据中建模多线关系的有用抽象，而HyperGraph群集是检测此类数据中紧密相关节点的组的任务。图表群集已经进行了广泛的研究，并且有许多方法用于检测小型的局部簇，而无需探索整个输入图。但是，在超图中，只有几种专门的方法。在这里，我们提出了一个基于最小化局部比率削减目标的本地超图聚类的框架。我们的框架在HyperGraph中采用了一组参考节点，并解决了一系列HyperGraph最低$ S $ -S $ -T $切割问题，以确定附近良好连接的节点群，该群集与输入集大致重叠。我们的方法扩展了基于图的技术，但更加通用，并具有新的输出质量保证。首先，我们的方法可以最大程度地减少新的广义剪切概念，这些概念取决于每个高架中节点的特定配置，而不仅仅是切割的超匹配数。其次，我们的框架在输出群集质量方面具有几种有吸引力的理论特性。最重要的是，我们的算法是强烈的本地，这意味着其运行时仅取决于输入集的大小，并且不需要探索整个超图即可找到良好的本地簇。我们使用我们的方法来有效地识别现实世界中数据的簇，这些簇具有数百万个节点，数百万的超增强和较大的平均高度尺寸，并且在几秒钟至几分钟之间，跑步时间范围范围很大。

Hypergraphs are a useful abstraction for modeling multiway relationships in data, and hypergraph clustering is the task of detecting groups of closely related nodes in such data. Graph clustering has been studied extensively, and there are numerous methods for detecting small, localized clusters without having to explore an entire input graph. However, there are only a few specialized approaches for localized clustering in hypergraphs. Here we present a framework for local hypergraph clustering based on minimizing localized ratio cut objectives. Our framework takes an input set of reference nodes in a hypergraph and solves a sequence of hypergraph minimum $s$-$t$ cut problems in order to identify a nearby well-connected cluster of nodes that overlaps substantially with the input set. Our methods extend graph-based techniques but are significantly more general and have new output quality guarantees. First, our methods can minimize new generalized notions of hypergraph cuts, which depend on specific configurations of nodes within each hyperedge, rather than just on the number of cut hyperedges. Second, our framework has several attractive theoretical properties in terms of output cluster quality. Most importantly, our algorithm is strongly-local, meaning that its runtime depends only on the size of the input set, and does not need to explore the entire hypergraph to find good local clusters. We use our methodology to effectively identify clusters in hypergraphs of real-world data with millions of nodes, millions of hyperedges, and large average hyperedge size with runtimes ranging between a few seconds and a few minutes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题