基于热方程的自我监督学习

论文标题

基于热方程的自我监督学习

Self-Supervised Learning based on Heat Equation

论文作者

Chen, Yinpeng, Dai, Xiyang, Chen, Dongdong, Liu, Mengchen, Yuan, Lu, Liu, Zicheng, Lin, Youzuo

论文摘要

本文介绍了基于将热量方程扩展到高维特征空间的新观点。特别是，我们通过稳态条件去除时间依赖性，并从X-- y各向同性到线性相关的剩余2D拉普拉斯式扩展。此外，我们通过将X和Y轴分配为两个一阶线性微分方程来简化它。这样的简化将沿水平和垂直方向的空间不变性明确建模，从而支持跨图像块的预测。这引入了一种非常简单的掩盖图像建模（MIM）方法，称为QB-Heat。 QB-Heat留下一个单个块，其中包含四分之一图像的大小，并线性地推断了其他三个蒙版的区域。它将MIM带到没有铃铛和口哨声的CNN中，甚至适用于适用于图像分类和对象检测而无需微调的预训练轻型网络。与具有580万参数和28.5m拖鞋的移动形式进行预训练的MoCO-V2相比，QB-Heat在Imagenet上的线性探测中处于质量质量上，但显然在非线性探测中优于在线性分类器之前添加变压器块（65.6％vs. 52.9％）。当用冷冻骨架转移到对象检测中时，QB加热的表现分别优于MOCO-V2，并分别由7.9和4.5 AP对Imagenet进行了监督的预训练。这项工作为视觉表示内部的不同形状和纹理中的不变性提供了一个有见地的假设：水平和垂直衍生物之间的线性关系。该代码将公开发布。

This paper presents a new perspective of self-supervised learning based on extending heat equation into high dimensional feature space. In particular, we remove time dependence by steady-state condition, and extend the remaining 2D Laplacian from x--y isotropic to linear correlated. Furthermore, we simplify it by splitting x and y axes as two first-order linear differential equations. Such simplification explicitly models the spatial invariance along horizontal and vertical directions separately, supporting prediction across image blocks. This introduces a very simple masked image modeling (MIM) method, named QB-Heat. QB-Heat leaves a single block with size of quarter image unmasked and extrapolates other three masked quarters linearly. It brings MIM to CNNs without bells and whistles, and even works well for pre-training light-weight networks that are suitable for both image classification and object detection without fine-tuning. Compared with MoCo-v2 on pre-training a Mobile-Former with 5.8M parameters and 285M FLOPs, QB-Heat is on par in linear probing on ImageNet, but clearly outperforms in non-linear probing that adds a transformer block before linear classifier (65.6% vs. 52.9%). When transferring to object detection with frozen backbone, QB-Heat outperforms MoCo-v2 and supervised pre-training on ImageNet by 7.9 and 4.5 AP respectively. This work provides an insightful hypothesis on the invariance within visual representation over different shapes and textures: the linear relationship between horizontal and vertical derivatives. The code will be publicly released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题