论文标题
增强组件分析:通过增强重叠建模相似性
Augmentation Component Analysis: Modeling Similarity via the Augmentation Overlaps
论文作者
论文摘要
自我监督的学习旨在学习一个嵌入的空间,在语义上相似的样本接近。对比学习方法将样本的视图拉在一起,并将不同的样品推开,这利用了增强的语义不变性,但忽略了样本之间的关系。为了更好地利用增强功能,我们观察到语义上相似的样本更有可能具有相似的增强视图。因此,我们可以将增强视图作为样本的特殊描述。在本文中,我们将其模拟为增强分布的描述,并将其称为增强功能。增强功能的相似性反映了两个样品的观点重叠的程度,并且与它们的语义相似性有关。如果没有计算负担以明确估计增强功能的值,我们提出了增强成分分析(ACA),并具有类似对比的损失,以学习主成分和对嵌入数据的直接投影损失。 ACA等于PCA的有效尺寸降低,并提取低维嵌入,从理论上保留了样品之间增强分布的相似性。经验结果表明,我们的方法可以针对不同基准的各种传统对比学习方法获得竞争成果。
Self-supervised learning aims to learn a embedding space where semantically similar samples are close. Contrastive learning methods pull views of samples together and push different samples away, which utilizes semantic invariance of augmentation but ignores the relationship between samples. To better exploit the power of augmentation, we observe that semantically similar samples are more likely to have similar augmented views. Therefore, we can take the augmented views as a special description of a sample. In this paper, we model such a description as the augmentation distribution and we call it augmentation feature. The similarity in augmentation feature reflects how much the views of two samples overlap and is related to their semantical similarity. Without computational burdens to explicitly estimate values of the augmentation feature, we propose Augmentation Component Analysis (ACA) with a contrastive-like loss to learn principal components and an on-the-fly projection loss to embed data. ACA equals an efficient dimension reduction by PCA and extracts low-dimensional embeddings, theoretically preserving the similarity of augmentation distribution between samples. Empirical results show our method can achieve competitive results against various traditional contrastive learning methods on different benchmarks.