用于单眼深度和流量估计的跨域学习的紧凑结构

论文标题

用于单眼深度和流量估计的跨域学习的紧凑结构

A Compacted Structure for Cross-domain learning on Monocular Depth and Flow Estimation

论文作者

Chen, Yu, Cao, Xu, Lin, Xiaoyi, Huang, Baoru, Zhou, Xiao-Yun, Zheng, Jian-Qing, Yang, Guang-Zhong

论文摘要

准确的运动和深度恢复对于包括自动驾驶在内的许多机器人视觉任务很重要。以前的大多数研究都通过预定义的损失函数或跨域预测实现了合作的多任务相互作用。本文提出了一种多任务方案，该方案通过我们的流动到深度（F2D），深度流动（D2F）和指数移动平均值（EMA）来实现相互帮助。 F2D和D2F机制可以基于可区分的浅网，可以在光流和深度域之间进行多尺度信息集成。双头机制用于基于分隔和拼接方式来预测刚性和非刚性运动的光流，从而显着改善了光流估计的性能。此外，为了使预测更加稳健和稳定，EMA用于我们的多任务培训。 KITTI数据集的实验结果表明，我们的多任务方案的表现优于其他多任务方案，并在预测结果上提供了明显的改进。

Accurate motion and depth recovery is important for many robot vision tasks including autonomous driving. Most previous studies have achieved cooperative multi-task interaction via either pre-defined loss functions or cross-domain prediction. This paper presents a multi-task scheme that achieves mutual assistance by means of our Flow to Depth (F2D), Depth to Flow (D2F), and Exponential Moving Average (EMA). F2D and D2F mechanisms enable multi-scale information integration between optical flow and depth domain based on differentiable shallow nets. A dual-head mechanism is used to predict optical flow for rigid and non-rigid motion based on a divide-and-conquer manner, which significantly improves the optical flow estimation performance. Furthermore, to make the prediction more robust and stable, EMA is used for our multi-task training. Experimental results on KITTI datasets show that our multi-task scheme outperforms other multi-task schemes and provide marked improvements on the prediction results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题