关于自我监督的视觉表示学习的动量编码器的利弊

论文标题

关于自我监督的视觉表示学习的动量编码器的利弊

On the Pros and Cons of Momentum Encoder in Self-Supervised Visual Representation Learning

论文作者

Pham, Trung, Zhang, Chaoning, Niu, Axi, Zhang, Kang, Yoo, Chang D.

论文摘要

指数移动平均值（EMA或动量）被广泛用于现代自学学习（SSL）方法，例如MOCO，以提高性能。我们证明，此类动量也可以插入无动量的SSL框架（例如SIMCLR），以提高性能。尽管它广泛用作现代SSL框架中的基本组成部分，但动量造成的好处尚未得到充分理解。我们发现它的成功至少可以部分归因于稳定性效应。在第一次尝试中，我们分析了EMA如何影响编码器的每个部分，并揭示了编码器输入附近的部分起着微不足道的作用，而后者部分具有更大的影响。通过监测编码器中每个块的输出的总体损失的梯度，我们观察到，最终层在反向传播过程中往往比其他层的波动更大，即稳定性较小。有趣的是，我们表明，使用EMA到SSL编码器的最后一部分，即投影仪，而不是整个深层网络编码器可以提供可比或可比性的性能。我们提出的仅投影仪的动量有助于维持EMA的好处，但避免了双向计算。

Exponential Moving Average (EMA or momentum) is widely used in modern self-supervised learning (SSL) approaches, such as MoCo, for enhancing performance. We demonstrate that such momentum can also be plugged into momentum-free SSL frameworks, such as SimCLR, for a performance boost. Despite its wide use as a fundamental component in modern SSL frameworks, the benefit caused by momentum is not well understood. We find that its success can be at least partly attributed to the stability effect. In the first attempt, we analyze how EMA affects each part of the encoder and reveal that the portion near the encoder's input plays an insignificant role while the latter parts have much more influence. By monitoring the gradient of the overall loss with respect to the output of each block in the encoder, we observe that the final layers tend to fluctuate much more than other layers during backpropagation, i.e. less stability. Interestingly, we show that using EMA to the final part of the SSL encoder, i.e. projector, instead of the whole deep network encoder can give comparable or preferable performance. Our proposed projector-only momentum helps maintain the benefit of EMA but avoids the double forward computation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题