可延展2.5D卷积：RGB-D场景解析的深度轴学习接收场

论文标题

可延展2.5D卷积：RGB-D场景解析的深度轴学习接收场

Malleable 2.5D Convolution: Learning Receptive Fields along the Depth-axis for RGB-D Scene Parsing

论文作者

Xing, Yajie, Wang, Jingbo, Zeng, Gang

论文摘要

深度数据提供了几何信息，可以在RGB-D场景解析任务中带来进展。最近的几项作品提出了RGB-D卷积操作员，该操作员沿深度轴构建接收场，以处理像素之间的3D邻域关系。但是，这些方法通过超参数预定了深度接受场，使它们依赖于参数选择。在本文中，我们提出了一个名为“可延展性2.5D卷积”的新型操作员，以学习沿深度轴的接受场。可延展的2.5D卷积具有一个或多个2D卷积内核。我们的方法将每个像素分配给其中一个内核，或者根据其相对深度差异都没有分配给它们，并且分配过程被表达为可区分的形式，以便可以通过梯度下降来学习。所提出的操作员在标准2D特征图上运行，并且可以无缝合并到预训练的CNN中。我们对两个具有挑战性的RGB语义分段数据集NYUDV2和CityScapes进行了广泛的实验，以验证我们方法的有效性和概括能力。

Depth data provide geometric information that can bring progress in RGB-D scene parsing tasks. Several recent works propose RGB-D convolution operators that construct receptive fields along the depth-axis to handle 3D neighborhood relations between pixels. However, these methods pre-define depth receptive fields by hyperparameters, making them rely on parameter selection. In this paper, we propose a novel operator called malleable 2.5D convolution to learn the receptive field along the depth-axis. A malleable 2.5D convolution has one or more 2D convolution kernels. Our method assigns each pixel to one of the kernels or none of them according to their relative depth differences, and the assigning process is formulated as a differentiable form so that it can be learnt by gradient descent. The proposed operator runs on standard 2D feature maps and can be seamlessly incorporated into pre-trained CNNs. We conduct extensive experiments on two challenging RGB-D semantic segmentation dataset NYUDv2 and Cityscapes to validate the effectiveness and the generalization ability of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题