论文标题
在计数数据中对多余的零建模:建模方法的新观点
Modelling excess zeros in count data: A new perspective on modelling approaches
论文作者
论文摘要
我们考虑对计数数据的分析,其中观察到的零计数频率异常大,通常相对于泊松分布。我们专注于两种替代建模方法:过度分散(OD)模型和零通气(ZI)模型,这两种模型都可以看作是泊松分布的概括。我们将这些分别称为隐式和显式ZI模型。尽管有时被视为竞争方法,但它们可以是互补的。 OD是ZI建模的结果,而ZI是OD建模的副产品。此类分析中的核心目标通常与对协变量对平均值的影响有关,鉴于计数中明显的零过量。通常,多余的零本身的建模是次要目标,并且在OD和ZI方法之间有选择。本文的贡献主要是概念性的。我们描述地对比对两种方法的零的影响进行了对比。我们通过提供统一的理论框架进行比较,进一步提供了替代ZI模型(包括经典障碍和混合模型)的新颖描述性表征。这反过来导致了一种新颖而技术上简单的ZI模型。我们开发了单变量计数的基本理论,并涉及其对多变量计数数据的影响。
We consider the analysis of count data in which the observed frequency of zero counts is unusually large, typically with respect to the Poisson distribution. We focus on two alternative modelling approaches: Over-Dispersion (OD) models, and Zero-Inflation (ZI) models, both of which can be seen as generalisations of the Poisson distribution; we refer to these as Implicit and Explicit ZI models, respectively. Although sometimes seen as competing approaches, they can be complementary; OD is a consequence of ZI modelling, and ZI is a by-product of OD modelling. The central objective in such analyses is often concerned with inference on the effect of covariates on the mean, in light of the apparent excess of zeros in the counts. Typically the modelling of the excess zeros per se is a secondary objective and there are choices to be made between, and within, the OD and ZI approaches. The contribution of this paper is primarily conceptual. We contrast, descriptively, the impact on zeros of the two approaches. We further offer a novel descriptive characterisation of alternative ZI models, including the classic hurdle and mixture models, by providing a unifying theoretical framework for their comparison. This in turn leads to a novel and technically simpler ZI model. We develop the underlying theory for univariate counts and touch on its implication for multivariate count data.