论文标题

大规模定向网络的随机光谱共聚类

Randomized spectral co-clustering for large-scale directed networks

论文作者

Guo, Xiao, Qiu, Yixuan, Zhang, Hai, Chang, Xiangyu

论文摘要

定向网络广泛用于表示单位之间的不对称关系。共簇旨在同时聚集有向网络的发件人和接收器。特别是,众所周知的光谱聚类算法可以修改为光谱共聚类到共簇的有向网络。但是,大规模网络对其构成了巨大的计算挑战。在本文中,我们利用素描技术并得出两种随机光谱共聚类算法,一种\ emph {基于随机反射},另一个基于\ emph {基于随机smpling {基于随机抽样的},以加速大规模定向网络的共聚类。我们从理论上分析了两个生成模型下的所得算法 - 随机盖块模型和由度校正的随机盖块模型,并建立其近似错误率和错误分布错误率,表明比共群集文献的最先进的结果更好。从数值上讲,我们设计和进行模拟以支持我们的理论结果并测试具有多达数百万个节点的真实网络上算法的效率。开发了一个公开可用的R软件包\ Textsf {randclust},以更好地可用性和可重复性。

Directed networks are broadly used to represent asymmetric relationships among units. Co-clustering aims to cluster the senders and receivers of directed networks simultaneously. In particular, the well-known spectral clustering algorithm could be modified as the spectral co-clustering to co-cluster directed networks. However, large-scale networks pose great computational challenges to it. In this paper, we leverage sketching techniques and derive two randomized spectral co-clustering algorithms, one \emph{random-projection-based} and the other \emph{random-sampling-based}, to accelerate the co-clustering of large-scale directed networks. We theoretically analyze the resulting algorithms under two generative models -- the stochastic co-block model and the degree-corrected stochastic co-block model, and establish their approximation error rates and misclustering error rates, indicating better bounds than the state-of-the-art results of co-clustering literature. Numerically, we design and conduct simulations to support our theoretical results and test the efficiency of the algorithms on real networks with up to millions of nodes. A publicly available R package \textsf{RandClust} is developed for better usability and reproducibility of the proposed methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源