对密集的人进行深度学习和深度图像的检测

论文标题

对密集的人进行深度学习和深度图像的检测

Towards Dense People Detection with Deep Learning and Depth images

论文作者

Fuentes-Jimenez, David, Losada-Gutierrez, Cristina, Casillas-Perez, David, Macias-Guarasa, Javier, Martin-Lopez, Roberto, Pizarro, Daniel, Luna, Carlos A.

论文摘要

本文提出了一个基于DNN的系统，该系统从单个深度图像中检测多个人。我们的神经网络处理深度图像，并在图像坐标中输出一张似然图，每个检测都对应于以人头为中心的高斯形状分布。似然图既编码被检测到的人的数量及其2D图像位置，并且可用于使用深度图像和相机校准参数来恢复每个人的3D位置。我们的体系结构是紧凑的，使用分开的卷积来提高性能，并以低预算GPU实时运行。我们使用模拟数据最初训练网络，然后进行微调，并使用相对较少的真实数据进行微调。我们表明该策略是有效的，产生了与训练中使用的场景不同的网络。我们将我们的方法与现有的最新方法（包括经典和DNN的解决方案）进行了彻底比较。我们的方法表现优于现有方法，并且可以准确地检测出明显遮挡的场景中的人。

This paper proposes a DNN-based system that detects multiple people from a single depth image. Our neural network processes a depth image and outputs a likelihood map in image coordinates, where each detection corresponds to a Gaussian-shaped local distribution, centered at the person's head. The likelihood map encodes both the number of detected people and their 2D image positions, and can be used to recover the 3D position of each person using the depth image and the camera calibration parameters. Our architecture is compact, using separated convolutions to increase performance, and runs in real-time with low budget GPUs. We use simulated data for initially training the network, followed by fine tuning with a relatively small amount of real data. We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training. We thoroughly compare our method against the existing state-of-the-art, including both classical and DNN-based solutions. Our method outperforms existing methods and can accurately detect people in scenes with significant occlusions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题