论文标题
主要是无害的机器学习:线性IV模型中学习最佳仪器
Mostly Harmless Machine Learning: Learning Optimal Instruments in Linear IV Models
论文作者
论文摘要
我们提供直接的理论结果,以证明将机器学习纳入标准线性仪器变量设置合理。关键思想是使用机器学习,并结合样品分解,以预测仪器和任何外源协变量的处理变量,然后使用这种预测的治疗方法,并将协变量作为技术工具来恢复第二阶段的系数。这使研究人员可以在处理和仪器之间提取非线性共同变化,从而通过提高仪器强度来大大提高估计精度和鲁棒性。重要的是,我们将机器学习的预测限制为外源协变量中的线性,从而避免了由于治疗与协变量之间的非线性关系引起的虚假鉴定。我们表明,这种方法在弱条件下提供一致且渐近的正常估计,并且可以适应半平均效率(Chamberlain,1992)。我们的方法保留了线性仪器可变方法的标准直觉和解释,包括在弱标识下,并为应用的经济学工具箱提供了简单,用户友好的升级。我们以法律和刑事司法的示例来说明我们的方法,研究上诉法院撤销对地方法院判决判决的因果影响。
We offer straightforward theoretical results that justify incorporating machine learning in the standard linear instrumental variable setting. The key idea is to use machine learning, combined with sample-splitting, to predict the treatment variable from the instrument and any exogenous covariates, and then use this predicted treatment and the covariates as technical instruments to recover the coefficients in the second-stage. This allows the researcher to extract non-linear co-variation between the treatment and instrument that may dramatically improve estimation precision and robustness by boosting instrument strength. Importantly, we constrain the machine-learned predictions to be linear in the exogenous covariates, thus avoiding spurious identification arising from non-linear relationships between the treatment and the covariates. We show that this approach delivers consistent and asymptotically normal estimates under weak conditions and that it may be adapted to be semiparametrically efficient (Chamberlain, 1992). Our method preserves standard intuitions and interpretations of linear instrumental variable methods, including under weak identification, and provides a simple, user-friendly upgrade to the applied economics toolbox. We illustrate our method with an example in law and criminal justice, examining the causal effect of appellate court reversals on district court sentencing decisions.