了解整个视频的道路布局

论文标题

了解整个视频的道路布局

Understanding Road Layout from Videos as a Whole

论文作者

Liu, Buyu, Zhuang, Bingbing, Schulter, Samuel, Ji, Pan, Chandraker, Manmohan

论文摘要

在本文中，我们解决了从视频序列推断复杂道路场景布局的问题。为此，我们将其作为顶级路线属性预测问题提出，我们的目标是准确，一致地预测每个帧的这些属性。与先前的工作相反，我们利用以下三个新颖方面：在视频中利用摄像机动作，包括上下文cus和结合长期视频信息。具体来说，我们介绍了一个旨在在视频中执行预测一致性的模型。我们的模型由一个LSTM和一个特征变换模块（FTM）组成。前者隐式地将一致性约束及其隐藏状态纳入，后者在沿视频汇总信息时明确考虑了相机运动。此外，我们建议通过介绍道路参与者来纳入上下文信息，例如对象，进入我们的模型。当整个视频序列可用时，我们的模型还可以编码本地和全局提示，例如来自过去和将来的框架的信息。两个数据集的实验表明：（1）合并一个全球的情境线索提高了预测准确性和利用都可以提供最佳性能。（2）引入LSTM和FTM模块可提高视频的预测一致性。（3）所提出的方法的表现优于SOTA。

In this paper, we address the problem of inferring the layout of complex road scenes from video sequences. To this end, we formulate it as a top-view road attributes prediction problem and our goal is to predict these attributes for each frame both accurately and consistently. In contrast to prior work, we exploit the following three novel aspects: leveraging camera motions in videos, including context cuesand incorporating long-term video information. Specifically, we introduce a model that aims to enforce prediction consistency in videos. Our model consists of one LSTM and one Feature Transform Module (FTM). The former implicitly incorporates the consistency constraint with its hidden states, and the latter explicitly takes the camera motion into consideration when aggregating information along videos. Moreover, we propose to incorporate context information by introducing road participants, e.g. objects, into our model. When the entire video sequence is available, our model is also able to encode both local and global cues, e.g. information from both past and future frames. Experiments on two data sets show that: (1) Incorporating either globalor contextual cues improves the prediction accuracy and leveraging both gives the best performance. (2) Introducing the LSTM and FTM modules improves the prediction consistency in videos. (3) The proposed method outperforms the SOTA by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题