双重任务暹罗变形金刚用于建筑损害评估的框架

论文标题

双重任务暹罗变形金刚用于建筑损害评估的框架

Dual-Tasks Siamese Transformer Framework for Building Damage Assessment

论文作者

Chen, Hongruixuan, Nemni, Edoardo, Vallecorsa, Sofia, Li, Xi, Wu, Chen, Bromley, Lars

论文摘要

有关建筑物损害程度的准确且细粒度的信息对于人道主义救济和灾难反应至关重要。但是，作为遥感解释任务中最常用的体系结构，卷积神经网络（CNN）的建模能力有限，可以对像素之间的非本地关系进行建模。最近，最初提出的用于建模自然语言处理中的长期依赖性的变压器体系结构显示了计算机视觉任务中有希望的结果。考虑到计算机视觉字段中变压器体系结构的前沿进步，在本文中，我们提出了设计基于变压器的损害评估体系结构（DAMFORMER）的首次尝试。在DAMFORMER中，首先构建了一个暹罗变压器编码器，以从输入多阶段图像对中提取非本地和代表性的深度特征。然后，一个多阶段融合模块设计为融合下游任务的信息。最后，轻巧的双任务解码器汇总了最终预测的多层功能。据我们所知，这是第一次提出这样的基于变压器的深度网络来用于多阶段遥感解释任务。大规模损坏评估数据集XBD的实验结果证明了基于变压器的架构的潜力。

Accurate and fine-grained information about the extent of damage to buildings is essential for humanitarian relief and disaster response. However, as the most commonly used architecture in remote sensing interpretation tasks, Convolutional Neural Networks (CNNs) have limited ability to model the non-local relationship between pixels. Recently, Transformer architecture first proposed for modeling long-range dependency in natural language processing has shown promising results in computer vision tasks. Considering the frontier advances of Transformer architecture in the computer vision field, in this paper, we present the first attempt at designing a Transformer-based damage assessment architecture (DamFormer). In DamFormer, a siamese Transformer encoder is first constructed to extract non-local and representative deep features from input multitemporal image-pairs. Then, a multitemporal fusion module is designed to fuse information for downstream tasks. Finally, a lightweight dual-tasks decoder aggregates multi-level features for final prediction. To the best of our knowledge, it is the first time that such a deep Transformer-based network is proposed for multitemporal remote sensing interpretation tasks. The experimental results on the large-scale damage assessment dataset xBD demonstrate the potential of the Transformer-based architecture.

下载PDF全文

下载文献需遵守相关版权规定

论文标题