MUSTER：用于语义分割的多尺度变压器解码器

论文标题

MUSTER：用于语义分割的多尺度变压器解码器

MUSTER: A Multi-scale Transformer-based Decoder for Semantic Segmentation

论文作者

Xu, Jing, Shi, Wentao, Gao, Pan, Wang, Zhengwei, Li, Qizhu

论文摘要

在有关语义细分的最新著作中，一直着重于设计和集成基于变压器的编码器。但是，对基于变压器的解码器的关注较少。我们强调的是，解码器阶段与实现卓越分割性能的编码者同样重要。它可以解开和完善高级提示，从而在像素级别上实现精确的对象边界描述。在本文中，我们介绍了一种基于变压器的新型解码器，该解码器称为Muster，该解码器与分层编码器无缝集成，并始终如一地提供高质量的分割结果，无论编码器体系结构如何。此外，我们提出了一种召集的变体，可以在保持性能的同时减少拖鞋。 Muster融合了精心设计的多头跳过注意（MSKA）单元，并引入了创新的提升采样操作。 MSKA单元可以从编码器和解码器中融合多尺度功能，从而促进全面的信息集成。上采样操作利用编码器功能来增强对象定位并超过传统的UP采样方法，将MIOU（平均交叉点）提高0.4％至3.2％。在具有挑战性的ADE20K数据集上，我们的最佳模型达到了50.23的单一尺度MIOU，而多尺度MIOU为51.88，这与当前的最新模型相比之下。值得注意的是，我们实现了这一目标，同时将Flops的数量显着减少了61.3％。我们的源代码和模型可在以下网址公开获取：https：//github.com/shiwt03/muster。

In recent works on semantic segmentation, there has been a significant focus on designing and integrating transformer-based encoders. However, less attention has been given to transformer-based decoders. We emphasize that the decoder stage is equally vital as the encoder in achieving superior segmentation performance. It disentangles and refines high-level cues, enabling precise object boundary delineation at the pixel level. In this paper, we introduce a novel transformer-based decoder called MUSTER, which seamlessly integrates with hierarchical encoders and consistently delivers high-quality segmentation results, regardless of the encoder architecture. Furthermore, we present a variant of MUSTER that reduces FLOPS while maintaining performance. MUSTER incorporates carefully designed multi-head skip attention (MSKA) units and introduces innovative upsampling operations. The MSKA units enable the fusion of multi-scale features from the encoder and decoder, facilitating comprehensive information integration. The upsampling operation leverages encoder features to enhance object localization and surpasses traditional upsampling methods, improving mIoU (mean Intersection over Union) by 0.4% to 3.2%. On the challenging ADE20K dataset, our best model achieves a single-scale mIoU of 50.23 and a multi-scale mIoU of 51.88, which is on-par with the current state-of-the-art model. Remarkably, we achieve this while significantly reducing the number of FLOPs by 61.3%. Our source code and models are publicly available at: https://github.com/shiwt03/MUSTER.

下载PDF全文

下载文献需遵守相关版权规定

论文标题