论文标题
平均最近邻居比率和Ripley的K功能的集群检测能力在Areal数据上:经验评估
Cluster Detection Capabilities of the Average Nearest Neighbor Ratio and Ripley's K Function on Areal Data: an Empirical Assessment
论文作者
论文摘要
空间聚类检测方法广泛用于许多领域,包括流行病学,生态学,生物学,物理学和社会学。在这些领域,面积数据通常是感兴趣的。这种数据可能是由空间聚集(例如,县的数量疾病病例)引起的,也可能是整个面积单元的固有属性(例如,保守土地包裹的栖息地适合性)。这项研究旨在评估两种空间聚类检测方法的性能:平均邻居(ANN)比率和Ripley的K函数。这些方法是为了点过程数据而设计的,但是它们在GIS软件中的实现易于实现(例如,在ESRI ArcGIS中),并且缺乏针对Areal数据的类似方法有助于其用于Areal数据。尽管将这些方法应用于Areal数据,但很少研究在Areal数据上下文中探索了它们的属性。在本文中,我们进行了一项模拟研究,以评估各种面积结构和空间依赖类型下的每种方法的性能。这些研究发现,使用ANN比率或Ripley的K函数进行假设检验的传统方法导致I型经验I型速率膨胀时,将其应用于Areal数据。我们证明,可以使用蒙特卡洛方法来确认数据的性质,以估算零假设下的测试统计量的分布,从而为两种方法都可以修复此问题。尽管目前尚未在ArcGIS中实现这种方法,但可以使用作者提供的代码轻松地在R中进行。
Spatial clustering detection methods are widely used in many fields including epidemiology, ecology, biology, physics, and sociology. In these fields, areal data is often of interest; such data may result from spatial aggregation (e.g. the number disease cases in a county) or may be inherent attributes of the areal unit as a whole (e.g. the habitat suitability of conserved land parcel). This study aims to assess the performance of two spatial clustering detection methods on areal data: the average nearest neighbor (ANN) ratio and Ripley's K function. These methods are designed for point process data, but their ease of implementation in GIS software (e.g., in ESRI ArcGIS) and the lack of analogous methods for areal data have contributed to their use for areal data. Despite the popularity of applying these methods to areal data, little research has explored their properties in the areal data context. In this paper we conduct a simulation study to evaluate the performance of each method for areal data under various areal structures and types of spatial dependence. These studies find that traditional approach to hypothesis testing using the ANN ratio or Ripley's K function results in inflated empirical type I rates when applied to areal data. We demonstrate that this issue can be remedied for both approaches by using Monte Carlo methods which acknowledge the areal nature of the data to estimate the distribution of the test statistic under the null hypothesis. While such an approach is not currently implemented in ArcGIS, it can be easily done in R using code provided by the authors.