论文标题
扩展与交叉相似性一致性正规化的动量对比
Extending Momentum Contrast with Cross Similarity Consistency Regularization
论文作者
论文摘要
对比性自我监督表示方法学习方法最大程度地提高了正对之间的相似性,同时倾向于最大程度地减少负对之间的相似性。但是,总的来说,负面对之间的相互作用被忽略了,因为它们没有根据其特定差异和相似性的特殊对待负面对处理的特殊机制。在本文中,我们提出了扩展的动量对比(Xmoco),这是一种基于MOCO家族配置中提出的动量编码单元的遗产,一种自我监督的表示方法。为此,我们引入了交叉一致性正规化损失,并通过该损失将转换一致性扩展到不同图像(负对)。在交叉一致性规则规则下,我们认为与任何一对图像相关的语义表示(正或负)应在借口转换下保留其交叉相似性。此外,我们通过在批处理上的负面对上实施相似性的均匀分布来进一步规范训练损失。可以轻松地将所提出的正则化以插件方式添加到现有的自我监督学习算法中。从经验上讲,我们报告了标准Imagenet-1K线性头部分类基准的竞争性能。此外,通过将学习的表示形式转移到常见的下游任务中,我们表明,将Xmoco与普遍使用的增强量一起使用可以改善此类任务的性能。我们希望本文的发现是研究人员考虑自我监督学习中负面例子的重要相互作用的动机。
Contrastive self-supervised representation learning methods maximize the similarity between the positive pairs, and at the same time tend to minimize the similarity between the negative pairs. However, in general the interplay between the negative pairs is ignored as they do not put in place special mechanisms to treat negative pairs differently according to their specific differences and similarities. In this paper, we present Extended Momentum Contrast (XMoCo), a self-supervised representation learning method founded upon the legacy of the momentum-encoder unit proposed in the MoCo family configurations. To this end, we introduce a cross consistency regularization loss, with which we extend the transformation consistency to dissimilar images (negative pairs). Under the cross consistency regularization rule, we argue that semantic representations associated with any pair of images (positive or negative) should preserve their cross-similarity under pretext transformations. Moreover, we further regularize the training loss by enforcing a uniform distribution of similarity over the negative pairs across a batch. The proposed regularization can easily be added to existing self-supervised learning algorithms in a plug-and-play fashion. Empirically, we report a competitive performance on the standard Imagenet-1K linear head classification benchmark. In addition, by transferring the learned representations to common downstream tasks, we show that using XMoCo with the prevalently utilized augmentations can lead to improvements in the performance of such tasks. We hope the findings of this paper serve as a motivation for researchers to take into consideration the important interplay among the negative examples in self-supervised learning.