多网络数据的离群值检测

论文标题

多网络数据的离群值检测

Outlier Detection for Multi-Network Data

论文作者

Dey, Pritam, Zhang, Zhengwu, Dunson, David B.

论文摘要

在神经科学研究中，它已成为使用神经影像学的不同个体的大脑网络的常规方法。这些网络通常表示为邻接矩阵，每个细胞都包含一对大脑区域之间的连通性摘要。有一个新兴的统计文献描述了用于分析此类多网络数据的方法，其中节点在整个网络之间是常见的，但边缘各不相同。但是，基本上没有考虑到异常检测的重要问题。特别是，对于某些受试者，神经影像学数据的质量较差，以至于无法可靠地重建网络。对于此类受试者，所得的邻接矩阵可能大部分为零或表现出与功能性大脑不一致的奇异模式。这些外围网络可能是有影响力的点，从而污染了随后的统计分析。我们为网络（ODIN）方法提出了一个简单的异常检测，该检测依赖于邻接矩阵的层次通用线性模型下的影响度量。描述了一种有效的计算算法，并通过模拟和对英国生物库的数据的应用进行了说明。奥丁成功地识别了中等至极端的异常值。删除此类离群值可以显着改变下游应用程序中的推论。

It has become routine in neuroscience studies to measure brain networks for different individuals using neuroimaging. These networks are typically expressed as adjacency matrices, with each cell containing a summary of connectivity between a pair of brain regions. There is an emerging statistical literature describing methods for the analysis of such multi-network data in which nodes are common across networks but the edges vary. However, there has been essentially no consideration of the important problem of outlier detection. In particular, for certain subjects, the neuroimaging data are so poor quality that the network cannot be reliably reconstructed. For such subjects, the resulting adjacency matrix may be mostly zero or exhibit a bizarre pattern not consistent with a functioning brain. These outlying networks may serve as influential points, contaminating subsequent statistical analyses. We propose a simple Outlier DetectIon for Networks (ODIN) method relying on an influence measure under a hierarchical generalized linear model for the adjacency matrices. An efficient computational algorithm is described, and ODIN is illustrated through simulations and an application to data from the UK Biobank. ODIN was successful in identifying moderate to extreme outliers. Removing such outliers can significantly change inferences in downstream applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题