利用多个数据推出的变异自动编码器

论文标题

利用多个数据推出的变异自动编码器

Leveraging variational autoencoders for multiple data imputation

论文作者

Roskams-Hieter, Breeshey, Wells, Jude, Wade, Sara

论文摘要

缺少数据是跨众多应用程序数据分析的主要障碍。最近，深层生成模型被用于丢失数据的插补，这是由于它们在数据中捕获高度非线性和复杂关系的能力的动机。在这项工作中，我们研究了深层模型，即变异自动编码器（VAE）的能力，以通过多种插补策略来解释缺失数据中的不确定性。我们发现，VAE提供了丢失数据的经验覆盖范围，并低估了插图和过度自信，尤其是对于更极端缺失的数据值。为了克服这一点，我们采用了$β$ - vaes，从广义贝叶斯框架中查看，为模型错误指定提供了鲁棒性。分配$β$的良好价值对于不确定性校准至关重要，我们证明了如何使用交叉验证来实现这一目标。在下游任务中，我们展示了$β$ -VAE的多次插补如何避免出现的错误发现，这些发现是插补的伪造。

Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to capture highly non-linear and complex relationships in the data. In this work, we investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing data through multiple imputation strategies. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations, particularly for more extreme missing data values. To overcome this, we employ $β$-VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification. Assigning a good value of $β$ is critical for uncertainty calibration and we demonstrate how this can be achieved using cross-validation. In downstream tasks, we show how multiple imputation with $β$-VAEs can avoid false discoveries that arise as artefacts of imputation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题