所有模型都是错误的，但是哪些有用？比较有限样品中因果效应的参数和非参数估计

论文标题

所有模型都是错误的，但是哪些有用？比较有限样品中因果效应的参数和非参数估计

All models are wrong, but which are useful? Comparing parametric and nonparametric estimation of causal effects in finite samples

论文作者

Rudolph, Kara E., Williams, Nicholas, Miles, Caleb H., Antonelli, Joseph, Diaz, Ivan

论文摘要

在统计，流行病学和计量经济学领域中，关于使用数据自适应方法的非参数估计（例如模型拟合中的机器学习算法）是否具有更简单的有意义的优势，在现实世界中具有任何有意义的优势，在现实世界中具有任何有意义的优势，在现实世界中的有限样本估计因果效应而具有任何有意义的优势。我们解决了一个问题：当试图估计治疗对结果的影响，在合理数据分布的宇宙中，非参数和参数估计的选择是多少？我们没有通过反映一些选择的数据情景来回答这个问题，而是提出了一种新的方法，评估了来自具有半信息先验的非参数模型的数千种数据生成机制的性能。我们称这种方法为通用的蒙特卡洛模拟。我们比较了两个参数估计器（使用参数结果模型的G型估计值和处理加权估计器的逆概率）和两个非参数估计器（贝叶斯添加剂回归树和靶向最小损耗的基于基于损耗的损耗的估计值）的估计的性能。我们从偏差，置信区间覆盖范围和平方误差方面总结了估计器性能。我们发现，非参数估计量几乎总是优于参数估计器，但在覆盖范围的偏差和相似的较高性能方面，在最小样本尺寸的n = 100方面具有相似的性能。

There is a long-standing debate in the statistical, epidemiological and econometric fields as to whether nonparametric estimation that uses data-adaptive methods, like machine learning algorithms in model fitting, confer any meaningful advantage over simpler, parametric approaches in real-world, finite sample estimation of causal effects. We address the question: when trying to estimate the effect of a treatment on an outcome, across a universe of reasonable data distributions, how much does the choice of nonparametric vs.~parametric estimation matter? Instead of answering this question with simulations that reflect a few chosen data scenarios, we propose a novel approach evaluating performance across thousands of data-generating mechanisms drawn from non-parametric models with semi-informative priors. We call this approach a Universal Monte-Carlo Simulation. We compare performance of estimating the average treatment effect across two parametric estimators (a g-computation estimator that uses a parametric outcome model and an inverse probability of treatment weighted estimator) and two nonparametric estimators (Bayesian additive regression trees and a targeted minimum loss-based estimator that uses an ensemble of machine learning algorithms in model fitting). We summarize estimator performance in terms of bias, confidence interval coverage, and mean squared error. We find that the nonparametric estimators nearly always outperform the parametric estimators with the exception of having similar performance in terms of bias and similar-to-slightly-worse performance in terms of coverage under the smallest sample size of N=100.

下载PDF全文

下载文献需遵守相关版权规定

论文标题