ADCO：从自训练的对手有效学习无监督的表示的对抗性对比度

论文标题

ADCO：从自训练的对手有效学习无监督的表示的对抗性对比度

AdCo: Adversarial Contrast for Efficient Learning of Unsupervised Representations from Self-Trained Negative Adversaries

论文作者

Hu, Qianjiang, Wang, Xiao, Hu, Wei, Qi, Guo-Jun

论文摘要

对比学习依赖于构建一系列负面示例的集合，这些示例足够难以歧视积极的查询。现有的对比学习方法要么在小匹配上保持负面样本的队列，而仅在迭代中更新了其中的一小部分，要么仅使用当前Minibatch中的其他示例作为负面示例。他们无法通过更新整个队列或从过去的Minibatches中丢弃有用的信息来密切跟踪学习的表示形式而不是迭代的变化。另外，我们提出直接学习一组负面的对手，反对自我训练的代表。两名参与者（代表网络和负面的对手）交替更新，以获取最具挑战性的负面示例，这些示例将培训积极查询的代表以区分。我们进一步表明，负面对手通过最大化对抗性对比损失的加权组合进行了更新，从而使他们能够随着时间的推移密切跟踪表示的变化。实验结果表明，所提出的对抗性对比度（ADCO）模型不仅取得了出色的性能（在200个时期内的前1个精度为73.2 \％，在800个时期内具有75.7 \％的速度，并具有在Imagenet上线性评估的800个时期），而且可以在更有效地培训，而较少的epochs则可以预先培训。

Contrastive learning relies on constructing a collection of negative examples that are sufficiently hard to discriminate against positive queries when their representations are self-trained. Existing contrastive learning methods either maintain a queue of negative samples over minibatches while only a small portion of them are updated in an iteration, or only use the other examples from the current minibatch as negatives. They could not closely track the change of the learned representation over iterations by updating the entire queue as a whole, or discard the useful information from the past minibatches. Alternatively, we present to directly learn a set of negative adversaries playing against the self-trained representation. Two players, the representation network and negative adversaries, are alternately updated to obtain the most challenging negative examples against which the representation of positive queries will be trained to discriminate. We further show that the negative adversaries are updated towards a weighted combination of positive queries by maximizing the adversarial contrastive loss, thereby allowing them to closely track the change of representations over time. Experiment results demonstrate the proposed Adversarial Contrastive (AdCo) model not only achieves superior performances (a top-1 accuracy of 73.2\% over 200 epochs and 75.7\% over 800 epochs with linear evaluation on ImageNet), but also can be pre-trained more efficiently with fewer epochs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题