论文标题

CMT-DeepLab:用于泛型分割的聚类掩码变压器

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

论文作者

Yu, Qihang, Wang, Huiyu, Kim, Dahun, Qiao, Siyuan, Collins, Maxwell, Zhu, Yukun, Adam, Hartwig, Yuille, Alan, Chen, Liang-Chieh

论文摘要

我们提出了聚类蒙版变压器(CMT-DeepLab),这是一种基于变压器的框架,用于围绕聚类设计。它重新考虑了用于分割和检测的现有变压器架构; CMT-DeepLab认为对象查询是群集中心,该中心填充了应用于分割时将像素分组的作用。聚类是通过交替过程计算的,首先通过其功能亲和力将像素分配给簇,然后更新集群中心和像素功能。这些操作共同包含聚类蒙版变压器(CMT)层,该层产生了跨注意事项,该层是密集的,并且与最终的分割任务更加一致。 CMT-DeepLab在可可Test-DEV集合中可显着提高4.4%PQ的先前ART的性能,从而达到55.7%的PQ。

We propose Clustering Mask Transformer (CMT-DeepLab), a transformer-based framework for panoptic segmentation designed around clustering. It rethinks the existing transformer architectures used in segmentation and detection; CMT-DeepLab considers the object queries as cluster centers, which fill the role of grouping the pixels when applied to segmentation. The clustering is computed with an alternating procedure, by first assigning pixels to the clusters by their feature affinity, and then updating the cluster centers and pixel features. Together, these operations comprise the Clustering Mask Transformer (CMT) layer, which produces cross-attention that is denser and more consistent with the final segmentation task. CMT-DeepLab improves the performance over prior art significantly by 4.4% PQ, achieving a new state-of-the-art of 55.7% PQ on the COCO test-dev set.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源