论文标题
网络通过尾巴不平等均匀
Network homophily via tail inequalities
论文作者
论文摘要
同质性是“相似性繁殖连接”的原则。我们在网络中提供了该原理的定量表述。给定一个网络和标记的其顶点分区,每个分区的每个类别的矢量索引,其条目是相应类引起的子图的边数,被视为通过在标签的类别中以随机分区选择具有同一cardinal cordinal contrical contrical continal continal cartical partitions所描述的随机矢量的结果。从这个角度来看,任何同质得分$θ$的值,即在观察到的结果中评估的分区类别诱导的子图的大小中的不降低的实际有价值函数,可以被视为观察到的随机变量的值。因此,根据分数$θ$,只要观察值$θ$的值至少与观察到的值一样极端时,输入网络在显着性水平$α$上是同质的,小于$α$。正如我们所显示的那样,即使近似$α$也是一个NP硬性问题,因此我们诉诸古典的尾巴不等式,从上面限制了$α$。这些上限是通过专门化$θ$获得的,产生了一类网络同质的量词。计算上限需要了解随机矢量的协方差矩阵,该矩阵以前在随机着色模型中尚不清楚。在本文中,我们缩小了这一差距,从而提供了一个有意义的,易于计算的索引,用于测量网络同质的网络。正如现实世界网络应用程序所证明的那样,这些索引是有效,可靠的,并导致了新发现,这些发现无法被当前的最新状态捕获。
Homophily is the principle whereby "similarity breeds connections". We give a quantitative formulation of this principle within networks. Given a network and a labeled partition of its vertices, the vector indexed by each class of the partition, whose entries are the number of edges of the subgraphs induced by the corresponding classes, is viewed as the observed outcome of the random vector described by picking labeled partitions at random among labeled partitions whose classes have the same cardinalities as the given one. In this perspective, the value of any homophily score $Θ$, namely a non decreasing real valued function in the sizes of subgraphs induced by the classes of the partition, evaluated at the observed outcome, can be thought of as the observed value of a random variable. Consequently, according to the score $Θ$, the input network is homophillic at the significance level $α$ whenever the one-sided tail probability of observing a value of $Θ$ at least as extreme as the observed one, is smaller than $α$. Since, as we show, even approximating $α$ is an NP-hard problem, we resort to classical tails inequality to bound $α$ from above. These upper bounds, obtained by specializing $Θ$, yield a class of quantifiers of network homophily. Computing the upper bounds requires the knowledge of the covariance matrix of the random vector which was not previously known within the random coloring model. In this paper we close this gap, giving a meaningful, easy to compute class of indices for measuring network homophily. As demonstrated in real-world network applications, these indices are effective, reliable, and lead to new discoveries that could not be captured by the current state of the art.