注意力引导的链式上下文聚合用于语义细分

论文标题

注意力引导的链式上下文聚合用于语义细分

Attention-guided Chained Context Aggregation for Semantic Segmentation

论文作者

Tang, Quan, Liu, Fagui, Zhang, Tong, Jiang, Jun, Zhang, Yu

论文摘要

完全卷积网络中的特征传播方式对于捕获用于获得精确分割掩码的多尺度上下文至关重要。本文提出了一种新型的称为链式上下文聚合模块（CAM）的新型串联杂种范式，以使特征传播多样化。 CAM通过连接的梯子式信息流动各种空间尺度的特征，并将它们融合在两个阶段的过程中，即预融合和重新融合。串行流程不断增加输出神经元的接收场，并并行编码不同的基于区域的上下文。每个信息流都是具有适当的下采样量表的浅编码器，以充分捕获上下文信息。我们进一步采用CAM中的注意力模型来指导功能重新融合。基于这些发展，我们构建了链式上下文聚合网络（CANET），该网络采用了不对称解码器来恢复预测图的精确空间细节。我们在包括Pascal VOC 2012，Pascal环境，CityScapes，Camvid，Sun-RGBD和Gatech在内的六个具有挑战性的数据集上进行了广泛的实验。结果证据表明CANET可以实现最先进的表现。

The way features propagate in Fully Convolutional Networks is of momentous importance to capture multi-scale contexts for obtaining precise segmentation masks. This paper proposes a novel series-parallel hybrid paradigm called the Chained Context Aggregation Module (CAM) to diversify feature propagation. CAM gains features of various spatial scales through chain-connected ladder-style information flows and fuses them in a two-stage process, namely pre-fusion and re-fusion. The serial flow continuously increases receptive fields of output neurons and those in parallel encode different region-based contexts. Each information flow is a shallow encoder-decoder with appropriate down-sampling scales to sufficiently capture contextual information. We further adopt an attention model in CAM to guide feature re-fusion. Based on these developments, we construct the Chained Context Aggregation Network (CANet), which employs an asymmetric decoder to recover precise spatial details of prediction maps. We conduct extensive experiments on six challenging datasets, including Pascal VOC 2012, Pascal Context, Cityscapes, CamVid, SUN-RGBD and GATECH. Results evidence that CANet achieves state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题