通过随机卷积来实现强大而可推广的视觉表示学习

论文标题

通过随机卷积来实现强大而可推广的视觉表示学习

Robust and Generalizable Visual Representation Learning via Random Convolutions

论文作者

Xu, Zhenlin, Liu, Deyi, Yang, Junlin, Raffel, Colin, Niethammer, Marc

论文摘要

虽然成功地完成了各种计算机视觉任务，但深厚的神经网络已证明容易受到纹理风格的变化和人类强大的小扰动的影响。在这项工作中，我们表明，通过将随机卷积用作数据增强，可以大大改善神经网络的鲁棒性。随机卷积近似形状，可能会扭曲本地纹理。直觉上，随机卷积会创造出具有相似全局形状但随机局部纹理的无限数量的新域。因此，我们使用多尺度随机卷积的输出作为新图像进行探索，或在训练过程中将它们与原始图像混合。当应用接受我们看不见域的方法训练的网络时，我们的方法一致地提高了域概括基准的性能，并且可扩展到Imagenet。特别是，在PACS和ImageNet-Sketch中概括为草图域的具有挑战性的情况下，我们的方法以大幅度优于最先进的方法。更有趣的是，我们的方法可以通过提供更健壮的视觉表示来使下游任务受益。

While successful for various computer vision tasks, deep neural networks have shown to be vulnerable to texture style shifts and small perturbations to which humans are robust. In this work, we show that the robustness of neural networks can be greatly improved through the use of random convolutions as data augmentation. Random convolutions are approximately shape-preserving and may distort local textures. Intuitively, randomized convolutions create an infinite number of new domains with similar global shapes but random local textures. Therefore, we explore using outputs of multi-scale random convolutions as new images or mixing them with the original images during training. When applying a network trained with our approach to unseen domains, our method consistently improves the performance on domain generalization benchmarks and is scalable to ImageNet. In particular, in the challenging scenario of generalizing to the sketch domain in PACS and to ImageNet-Sketch, our method outperforms state-of-art methods by a large margin. More interestingly, our method can benefit downstream tasks by providing a more robust pretrained visual representation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题