论文标题

常春藤:因果推理的仪器变量合成

Ivy: Instrumental Variable Synthesis for Causal Inference

论文作者

Kuang, Zhaobin, Sala, Frederic, Sohoni, Nimit, Wu, Sen, Córdova-Palomera, Aldo, Dunnmon, Jared, Priest, James, Ré, Christopher

论文摘要

一种流行的方法来估计观察数据中变量x对y的因果效应的方法是使用仪器变量(iv):仅通过x影响y的第三个变量z。 Z与X相关的越强,估计值越可靠,但是很难找到这样的强大IV。取而代之的是,从业者将更常见的IV候选者组合在一起,这些候选者不一定是强的,甚至有效的IV,即ivs--将其插入因果效应估计器代替IV的单个“摘要”中。在遗传流行病学中,这种方法称为等位基因评分。等位基因得分需要强大的假设 - 所有iv候选者的独立性和有效性 - - 为了可靠的估计值。为了放松这些假设,我们提出了常春藤,这是一种结合静脉注射候选者的新方法,该方法可以以鲁棒的方式处理相关和无效的IV候选者。从理论上讲,我们表征了这种鲁棒性,其极限及其对由此产生的因果估计的影响。从经验上讲,常春藤可以正确识别已知关系的方向性,并且在三个现实世界数据集上具有虚假发现(中值效应大小<= 0.025)具有鲁棒性,而没有因果效应,而等位基因得分返回了更多的偏见估计(中位效应大小> = 0.118)。

A popular way to estimate the causal effect of a variable x on y from observational data is to use an instrumental variable (IV): a third variable z that affects y only through x. The more strongly z is associated with x, the more reliable the estimate is, but such strong IVs are difficult to find. Instead, practitioners combine more commonly available IV candidates---which are not necessarily strong, or even valid, IVs---into a single "summary" that is plugged into causal effect estimators in place of an IV. In genetic epidemiology, such approaches are known as allele scores. Allele scores require strong assumptions---independence and validity of all IV candidates---for the resulting estimate to be reliable. To relax these assumptions, we propose Ivy, a new method to combine IV candidates that can handle correlated and invalid IV candidates in a robust manner. Theoretically, we characterize this robustness, its limits, and its impact on the resulting causal estimates. Empirically, Ivy can correctly identify the directionality of known relationships and is robust against false discovery (median effect size <= 0.025) on three real-world datasets with no causal effects, while allele scores return more biased estimates (median effect size >= 0.118).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源