Mininet：实时无监督的单眼估计的极轻的卷积神经网络

论文标题

Mininet：实时无监督的单眼估计的极轻的卷积神经网络

MiniNet: An extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation

论文作者

Liu, Jun, Li, Qing, Cao, Rui, Tang, Wenming, Qiu, Guoping

论文摘要

从单个图像中预测深度是一个有吸引力的研究主题，因为它提供了一个信息的一个维度，以使机器能够更好地感知世界。最近，深度学习已成为单眼深度估计的有效方法。由于获得标记的数据是昂贵的，因此最近有一种从监督学习转变为无监督学习以获得单眼深度的趋势。但是，大多数能够实现高深度预测精度的无监督学习方法将需要深层的网络体系结构，该架构太重和复杂，无法在嵌入式设备上运行，存储空间和内存空间有限。为了解决这个问题，我们提出了一个新的功能强大的网络，该网络具有一个经常性的模块，以实现深网的能力，同时保持极轻的尺寸以实时高性能无监督的单眼深度预测视频序列。此外，提出了一个新型有效的Uplample块来融合相关编码层的特征，并恢复特征的空间大小与少量模型参数。我们通过在KITTI数据集上的广泛实验来验证方法的有效性。我们的新型号可以在单个GPU上以每秒110帧的速度运行，单个CPU上的37 fps和Raspberry Pi 3上的2 fps运行。此外，它的深度准确性较高，比目前的模型更高的模型参数近33倍。据我们所知，这项工作是第一个非常轻巧的神经网络，该网络在单眼视频序列上训练，用于实时无监督的单眼估计，这开辟了在低成本嵌入设备上实施基于深度学习的实时无人研究的无处可比性的单眼深度预测的可能性。

Predicting depth from a single image is an attractive research topic since it provides one more dimension of information to enable machines to better perceive the world. Recently, deep learning has emerged as an effective approach to monocular depth estimation. As obtaining labeled data is costly, there is a recent trend to move from supervised learning to unsupervised learning to obtain monocular depth. However, most unsupervised learning methods capable of achieving high depth prediction accuracy will require a deep network architecture which will be too heavy and complex to run on embedded devices with limited storage and memory spaces. To address this issue, we propose a new powerful network with a recurrent module to achieve the capability of a deep network while at the same time maintaining an extremely lightweight size for real-time high performance unsupervised monocular depth prediction from video sequences. Besides, a novel efficient upsample block is proposed to fuse the features from the associated encoder layer and recover the spatial size of features with the small number of model parameters. We validate the effectiveness of our approach via extensive experiments on the KITTI dataset. Our new model can run at a speed of about 110 frames per second (fps) on a single GPU, 37 fps on a single CPU, and 2 fps on a Raspberry Pi 3. Moreover, it achieves higher depth accuracy with nearly 33 times fewer model parameters than state-of-the-art models. To the best of our knowledge, this work is the first extremely lightweight neural network trained on monocular video sequences for real-time unsupervised monocular depth estimation, which opens up the possibility of implementing deep learning-based real-time unsupervised monocular depth prediction on low-cost embedded devices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题