论文标题
大量流的异常检测:基于排列的高批评方法
Anomaly Detection for a Large Number of Streams: A Permutation-Based Higher Criticism Approach
论文作者
论文摘要
在观察大量数据流时,在各种应用中必不可少的异常检测,从流行病学研究到监测复杂系统。高维场景通常使用扫描统计和相关方法来解决,需要严格的建模假设才能进行适当的校准。在这项工作中,我们采取了非参数立场,并提出了一个基于排列的较高批评统计量的变体,而不需要对无效分布的了解。这导致在有限样品中进行精确测试,该测试在广泛的指数模型中渐近最佳。我们证明,相对于Oracle测试,有限样品中的功率损失最少。此外,由于提出的统计量不依赖渐近近似值,因此通常比依赖这种近似值的更高批评的流行变体更好。我们包括建议,可以在实践中很容易应用该测试,并证明其在监视活性成分的内容均匀性中的适用性。
Anomaly detection when observing a large number of data streams is essential in a variety of applications, ranging from epidemiological studies to monitoring of complex systems. High-dimensional scenarios are usually tackled with scan-statistics and related methods, requiring stringent modeling assumptions for proper calibration. In this work we take a non-parametric stance, and propose a permutation-based variant of the higher criticism statistic not requiring knowledge of the null distribution. This results in an exact test in finite samples which is asymptotically optimal in the wide class of exponential models. We demonstrate the power loss in finite samples is minimal with respect to the oracle test. Furthermore, since the proposed statistic does not rely on asymptotic approximations it typically performs better than popular variants of higher criticism that rely on such approximations. We include recommendations such that the test can be readily applied in practice, and demonstrate its applicability in monitoring the content uniformity of an active ingredient for a batch-produced drug product.