REALNET：将优化的对象检测与信息融合深度估计共同设计方法相结合

论文标题

REALNET：将优化的对象检测与信息融合深度估计共同设计方法相结合

RealNet: Combining Optimized Object Detection with Information Fusion Depth Estimation Co-Design Method on IoT

论文作者

Li, Zhuohao, Gou, Fandi, De, Qixin, Ding, Leqi, Zhang, Yuanhang, Cai, Yunze

论文摘要

在深度学习人工智能的指导下，深度估计和对象检测识别在自主驾驶技术中起着重要作用。我们提出了一种称为REALNET的混合结构：一种结合模型流式识别算法的共同设计方法，深度估计算法与信息融合，并将其部署在无人驾驶汽车的Jetson-Nano上，并与单眼视觉传感器进行部署。我们使用ROS进行实验。本文提出的方法适用于具有高实时请求的移动平台。我们方法的创新是使用信息融合来补偿输出图像帧速率不足的问题，并提高目标检测和在单眼视觉下的深度估计的鲁棒性。对象检测基于Yolo-V5。我们简化了其DarkNet53的网络结构，并实现了高达0.01的预测速度。深度估计基于VNL深度估计，该估计考虑了3D全球空间中的多个几何约束。它通过计算虚拟正常矢量VN和标签的偏差来计算损失函数，从而获得更深的深度信息。我们使用PNP融合算法来解决深度图输出帧速率不足的问题。它基于角功能匹配而解决了从三维目标到二维点的运动估计深度，该匹配比VNL计算快。我们插值VNL输出和PNP输出以实现信息融合。实验表明，这可以有效消除深度信息的抖动并改善鲁棒性。在控制端，此方法结合了目标检测和深度估计的结果以计算目标位置，并使用纯跟踪控制算法来跟踪它。

Depth Estimation and Object Detection Recognition play an important role in autonomous driving technology under the guidance of deep learning artificial intelligence. We propose a hybrid structure called RealNet: a co-design method combining the model-streamlined recognition algorithm, the depth estimation algorithm with information fusion, and deploying them on the Jetson-Nano for unmanned vehicles with monocular vision sensors. We use ROS for experiment. The method proposed in this paper is suitable for mobile platforms with high real-time request. Innovation of our method is using information fusion to compensate the problem of insufficient frame rate of output image, and improve the robustness of target detection and depth estimation under monocular vision.Object Detection is based on YOLO-v5. We have simplified the network structure of its DarkNet53 and realized a prediction speed up to 0.01s. Depth Estimation is based on the VNL Depth Estimation, which considers multiple geometric constraints in 3D global space. It calculates the loss function by calculating the deviation of the virtual normal vector VN and the label, which can obtain deeper depth information. We use PnP fusion algorithm to solve the problem of insufficient frame rate of depth map output. It solves the motion estimation depth from three-dimensional target to two-dimensional point based on corner feature matching, which is faster than VNL calculation. We interpolate VNL output and PnP output to achieve information fusion. Experiments show that this can effectively eliminate the jitter of depth information and improve robustness. At the control end, this method combines the results of target detection and depth estimation to calculate the target position, and uses a pure tracking control algorithm to track it.

下载PDF全文

下载文献需遵守相关版权规定

论文标题