论文标题
Satmae:时间和多光谱卫星图像的预训练变压器
SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
论文作者
论文摘要
大型视力模型的无监督预训练方法已显示出可以提高下游监督任务的性能。为卫星图像开发类似的技术带来了重要的机会,因为未标记的数据很丰富,并且固有的时间和多光谱结构提供了途径,以进一步改善现有的训练前策略。在本文中,我们提出了Satmae,这是基于蒙版自动编码器(MAE)的时间或多光谱卫星图像的预训练框架。为了利用时间信息,我们包括一个时间嵌入以及跨时间独立掩盖图像贴片。此外,我们证明,将多光谱数据编码为具有不同光谱位置编码的频段组是有益的。我们的方法在基准数据集(最高$ \ uparrow $ 7%)上的监督学习表现以及下游遥感任务(包括土地覆盖范围分类(包括$ \ uparrow $ 14%)和语义分段)和Sensication Champeration上的转移学习绩效以及在下游遥感任务(包括$ \ $ \ upertaws)和语义分段方面都对先前最先前的技术进行了强大的改进。代码和数据可在项目网站上找到:https://sustainlab-group.github.io/satmae/
Unsupervised pre-training methods for large vision models have shown to enhance performance on downstream supervised tasks. Developing similar techniques for satellite imagery presents significant opportunities as unlabelled data is plentiful and the inherent temporal and multi-spectral structure provides avenues to further improve existing pre-training strategies. In this paper, we present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE). To leverage temporal information, we include a temporal embedding along with independently masking image patches across time. In addition, we demonstrate that encoding multi-spectral data as groups of bands with distinct spectral positional encodings is beneficial. Our approach yields strong improvements over previous state-of-the-art techniques, both in terms of supervised learning performance on benchmark datasets (up to $\uparrow$ 7%), and transfer learning performance on downstream remote sensing tasks, including land cover classification (up to $\uparrow$ 14%) and semantic segmentation. Code and data are available on the project website: https://sustainlab-group.github.io/SatMAE/