Bisenet V2：具有指导汇总的双边网络，用于实时语义分段

论文标题

Bisenet V2：具有指导汇总的双边网络，用于实时语义分段

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

论文作者

Yu, Changqian, Gao, Changxin, Wang, Jingbo, Yu, Gang, Shen, Chunhua, Sang, Nong

论文摘要

低级细节和高级语义对于语义分割任务都是必不可少的。但是，为了加快模型推断，当前方法几乎总是牺牲低级细节，从而导致准确的降低。我们建议分别处理这些空间细节和分类语义，以实现高精度和高效率以实现实时语义细分。为此，我们提出了一种有效有效的体系结构，并在速度和准确性之间取决于双边分割网络（Bisenet V2）。该体系结构涉及：（i）一个细节分支，具有宽阔的通道和浅层层，可捕获低级细节并生成高分辨率特征表示；（ii）一个语义分支，具有狭窄的通道和深层，以获得高级语义上下文。语义分支由于降低了通道容量和快速降采样策略而轻巧。此外，我们设计了一个引导聚合层，以增强相互连接并融合两种特征表示。此外，助推器培训策略旨在提高细分性能，而无需任何额外的推理成本。广泛的定量和定性评估表明，所提出的架构对一些最新的实时语义分段方法表现出色。具体而言，对于2,048x1,024的输入，我们在一台NVIDIA GEFORCE GTX 1080 TI卡上以156 fps的速度达到了72.6％的均值IOU，这比现有方法要快得多，但我们实现了更好的分割准确性。

The low-level details and high-level semantics are both essential to the semantic segmentation task. However, to speed up the model inference, current approaches almost always sacrifice the low-level details, which leads to a considerable accuracy decrease. We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation. To this end, we propose an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral Segmentation Network (BiSeNet V2). This architecture involves: (i) a Detail Branch, with wide channels and shallow layers to capture low-level details and generate high-resolution feature representation; (ii) a Semantic Branch, with narrow channels and deep layers to obtain high-level semantic context. The Semantic Branch is lightweight due to reducing the channel capacity and a fast-downsampling strategy. Furthermore, we design a Guided Aggregation Layer to enhance mutual connections and fuse both types of feature representation. Besides, a booster training strategy is designed to improve the segmentation performance without any extra inference cost. Extensive quantitative and qualitative evaluations demonstrate that the proposed architecture performs favourably against a few state-of-the-art real-time semantic segmentation approaches. Specifically, for a 2,048x1,024 input, we achieve 72.6% Mean IoU on the Cityscapes test set with a speed of 156 FPS on one NVIDIA GeForce GTX 1080 Ti card, which is significantly faster than existing methods, yet we achieve better segmentation accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题