论文标题

失踪性转移的域适应

Domain Adaptation under Missingness Shift

论文作者

Zhou, Helen, Balakrishnan, Sivaraman, Lipton, Zachary C.

论文摘要

缺少数据的速率通常取决于记录保存策略,因此即使基础特征相对稳定,也可能会在时间和位置发生变化。在本文中,我们介绍了在失踪性转移(大坝)下适应领域适应的问题。在这里,(标记)源数据和(未标记的)目标数据将是可交换的,但对于不同的缺少数据机制。我们表明,如果丢失的数据指标可用,则大坝会减少到协变量转移。解决该指标缺乏的案例,我们确定以下理论结果,以完全随机报告的情况下持续不足:(i)违反协变量偏移(需要适应); (ii)最佳线性源预测指标在目标域上的执行效果比始终预测平均值差; (iii)即使丢失率本身不是,也可以确定最佳目标预测因子; (iv)对于线性模型,简单的分析调整产生了最佳目标参数的一致估计。在合成和半合成数据的实验中,我们证明了假设成立时方法的希望。最后,我们讨论了一个未来扩展的丰富家庭。

Rates of missing data often depend on record-keeping policies and thus may change across times and locations, even when the underlying features are comparatively stable. In this paper, we introduce the problem of Domain Adaptation under Missingness Shift (DAMS). Here, (labeled) source data and (unlabeled) target data would be exchangeable but for different missing data mechanisms. We show that if missing data indicators are available, DAMS reduces to covariate shift. Addressing cases where such indicators are absent, we establish the following theoretical results for underreporting completely at random: (i) covariate shift is violated (adaptation is required); (ii) the optimal linear source predictor can perform arbitrarily worse on the target domain than always predicting the mean; (iii) the optimal target predictor can be identified, even when the missingness rates themselves are not; and (iv) for linear models, a simple analytic adjustment yields consistent estimates of the optimal target parameters. In experiments on synthetic and semi-synthetic data, we demonstrate the promise of our methods when assumptions hold. Finally, we discuss a rich family of future extensions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源