论文标题
关于从相关数据中学到的分离表示形式
On Disentangled Representations Learned From Correlated Data
论文作者
论文摘要
分离方法的重点一直在确定数据变化的独立因素上。但是,现实世界观察的基础变量通常在统计上并不独立。在这项工作中,我们通过在一项大规模实证研究(包括4260个模型)中分析了对相关数据最突出的分解方法的行为来弥合与现实世界情景的差距。我们表明并量化了数据集中系统诱导的相关性正在学习并反映在潜在表示中,这对诸如公平之类的下游应用程序具有影响。我们还演示了如何解决这些潜在相关性,即在训练期间使用弱监督,或者通过事后纠正具有少数标签的预训练模型。
The focus of disentanglement approaches has been on identifying independent factors of variation in data. However, the causal variables underlying real-world observations are often not statistically independent. In this work, we bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data in a large-scale empirical study (including 4260 models). We show and quantify that systematically induced correlations in the dataset are being learned and reflected in the latent representations, which has implications for downstream applications of disentanglement such as fairness. We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.