论文标题
在对比学习中不应该对比
What Should Not Be Contrastive in Contrastive Learning
论文作者
论文摘要
最近的自我监督对比方法已经能够通过学习对不同的数据增强不变来产生令人印象深刻的可转移视觉表示。但是,这些方法隐含地假设一组特定的表示不变(例如,对颜色的不变性),并且当下游任务违反了这一假设时(例如,区分红色与黄色的汽车)时的性能很差。我们介绍了一个对比度学习框架,该框架不需要对特定任务依赖的不可能进行的先验知识。我们的模型学会通过构建单独的嵌入空间来捕获视觉表示的变化和不变因素,每个嵌入空间都不是一个增强量除了一个外部。我们使用带有共享骨干的多头网络,该网络可以捕获每个扩展范围内的信息,并且独自一人在下游任务上都优于所有基准。我们进一步发现,在我们研究的所有任务中,不变和变化的空间的串联在包括粗粒,细粒度和少量的下游分类任务以及各种数据损坏的所有任务中执行最佳。
Recent self-supervised contrastive methods have been able to produce impressive transferable visual representations by learning to be invariant to different data augmentations. However, these methods implicitly assume a particular set of representational invariances (e.g., invariance to color), and can perform poorly when a downstream task violates this assumption (e.g., distinguishing red vs. yellow cars). We introduce a contrastive learning framework which does not require prior knowledge of specific, task-dependent invariances. Our model learns to capture varying and invariant factors for visual representations by constructing separate embedding spaces, each of which is invariant to all but one augmentation. We use a multi-head network with a shared backbone which captures information across each augmentation and alone outperforms all baselines on downstream tasks. We further find that the concatenation of the invariant and varying spaces performs best across all tasks we investigate, including coarse-grained, fine-grained, and few-shot downstream classification tasks, and various data corruptions.