通过对比群集分配，无监督的视觉特征学习

论文标题

通过对比群集分配，无监督的视觉特征学习

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

论文作者

Caron, Mathilde, Misra, Ishan, Mairal, Julien, Goyal, Priya, Bojanowski, Piotr, Joulin, Armand

论文摘要

无监督的图像表示通过监督预处理大大减少了差距，特别是通过对比度学习方法的最新成就。这些对比方法通常在线工作，并依赖大量的显式成对功能比较，这在计算上具有挑战性。在本文中，我们提出了一种在线算法SWAV，该算法利用对比度方法而无需计算成对比较。具体而言，我们的方法同时将数据簇簇，同时在为同一图像的不同增强（或视图）产生的集群分配之间执行一致性，而不是像对比度学习中直接比较功能。简而言之，我们使用交换的预测机制，在其中从另一个视图的表示中预测视图的群集分配。我们的方法可以用大小批次培训，并且可以扩展到无限量的数据。与以前的对比方法相比，我们的方法更有效地有效，因为它不需要大型内存库或特殊的动量网络。此外，我们还提出了一种新的数据增强策略Multi-crop，该策略使用各种视图和不同的分辨率代替两个完整视图，而无需增加内存或计算要求。我们通过使用Resnet-50在ImageNet上实现75.3％的TOP-1准确性，并在所有考虑的转移任务上超过监督预处理，从而验证了我们的发现。

Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or views) of the same image, instead of comparing features directly as in contrastive learning. Simply put, we use a swapped prediction mechanism where we predict the cluster assignment of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data. Compared to previous contrastive methods, our method is more memory efficient since it does not require a large memory bank or a special momentum network. In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much. We validate our findings by achieving 75.3% top-1 accuracy on ImageNet with ResNet-50, as well as surpassing supervised pretraining on all the considered transfer tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题