论文标题
调查抽样中的统计数据集成:评论
Statistical Data Integration in Survey Sampling: A Review
论文作者
论文摘要
有限人口推断是调查抽样的核心目标。概率抽样是有限人口推断的主要统计方法。挑战是由于高成本和提高无响应率而引起的。与单独使用任何单个数据源相比,数据集成通过利用多个数据源来提供更强大,更有效的推断来提供及时的解决方案。数据集成技术取决于样本类型和要组合的可用信息。本文对数据集成技术进行了系统的审查,以结合概率样本,概率和非概率样本以及概率和大数据样本。我们讨论了广泛的整合方法,例如概括的最小二乘,校准权重,逆概率加权,质量插补和双重稳健方法。最后,我们重点介绍了未来研究的重要问题。
Finite population inference is a central goal in survey sampling. Probability sampling is the main statistical approach to finite population inference. Challenges arise due to high cost and increasing non-response rates. Data integration provides a timely solution by leveraging multiple data sources to provide more robust and efficient inference than using any single data source alone. The technique for data integration varies depending on types of samples and available information to be combined. This article provides a systematic review of data integration techniques for combining probability samples, probability and non-probability samples, and probability and big data samples. We discuss a wide range of integration methods such as generalized least squares, calibration weighting, inverse probability weighting, mass imputation and doubly robust methods. Finally, we highlight important questions for future research.