论文标题

基于树的半变量系数模型,用于Com-Poisson分布

A Tree-based Semi-Varying Coefficient Model for the COM-Poisson Distribution

论文作者

Chatla, Suneel Babu, Shmueli, Galit

论文摘要

我们为Conway-Maxwell- Poisson(CMP或Com-Poisson)分布提出了一个基于树的半变量系数模型,该模型是泊松分布的两参数概括,并且足够灵活,可以捕获下分散和过度分散在数量中。基于树的方法的优点是它们对高维数据的可扩展性。我们使用基于模型的递归分区(MOB)开发了CMPMOB,这是半变量系数模型的估计过程。所提出的框架比现有的MOB框架更广泛,因为它允许在模型中包括节点不变效果。为了简化原始MOB算法中所采用的详尽搜索的计算负担,通过从变更点估计方法中借用工具提出了一个新的拆分点估计过程。所提出的方法仅使用估计的分数函数,而无需为每个分式点拟合模型,因此在计算上更简单。由于基于树的方法仅对基础平滑函数提供零件的常数近似,因此我们提出了使用梯度增强过程进行估计的CMPBoost半变量系数模型。使用仿真研究和华盛顿特区自行车共享系统的真实示例来说明所提出方法的有用性。

We propose a tree-based semi-varying coefficient model for the Conway-Maxwell- Poisson (CMP or COM-Poisson) distribution which is a two-parameter generalization of the Poisson distribution and is flexible enough to capture both under-dispersion and over-dispersion in count data. The advantage of tree-based methods is their scalability to high-dimensional data. We develop CMPMOB, an estimation procedure for a semi-varying coefficient model, using model-based recursive partitioning (MOB). The proposed framework is broader than the existing MOB framework as it allows node-invariant effects to be included in the model. To simplify the computational burden of the exhaustive search employed in the original MOB algorithm, a new split point estimation procedure is proposed by borrowing tools from change point estimation methodology. The proposed method uses only the estimated score functions without fitting models for each split point and, therefore, is computationally simpler. Since the tree-based methods only provide a piece-wise constant approximation to the underlying smooth function, we propose the CMPBoost semi-varying coefficient model which uses the gradient boosting procedure for estimation. The usefulness of the proposed methods are illustrated using simulation studies and a real example from a bike sharing system in Washington, DC.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源