通过3D对象运动估算多平面图像的时间视图合成动态场景

论文标题

通过3D对象运动估算多平面图像的时间视图合成动态场景

Temporal View Synthesis of Dynamic Scenes through 3D Object Motion Estimation with Multi-Plane Images

论文作者

Somraj, Nagabhushan, Sancheti, Pranali, Soundararajan, Rajiv

论文摘要

可以通过定期预测未来的框架以增强虚拟现实应用程序中的用户体验来解决以图形方式渲染高帧速率视频的挑战。这是通过时间视图合成（TVS）的问题来研究的，该问题的目标是预测给定上一个框架和前一个帧和下一个帧的视频的下一个帧。在这项工作中，我们考虑了用户和对象正在移动的动态场景的电视。我们设计了一个将运动解散到用户和对象运动中的框架，以在预测下一帧的同时有效地使用可用的用户运动。我们通过在过去帧中隔离和估算3D对象运动，然后推断它来预测对象的运动。我们使用多平面图像（MPI）作为场景的3D表示，并将对象运动作为MPI表示中相应点之间的3D位移建模。为了在估计运动时处理MPI中的稀疏度，我们将部分汇合和掩盖相关层结合到估计相应的点。然后将预测的对象运动与给定的用户或相机运动集成在一起，以生成下一帧。使用不概括填充模块，我们合成由于相机和对象运动而发现的区域。我们为动态场景的电视开发了一个新的合成数据集，该数据集由800个以全高清分辨率组成的视频组成。我们通过数据集和MPI Sintel数据集上的实验表明我们的模型优于文献中的所有竞争方法。

The challenge of graphically rendering high frame-rate videos on low compute devices can be addressed through periodic prediction of future frames to enhance the user experience in virtual reality applications. This is studied through the problem of temporal view synthesis (TVS), where the goal is to predict the next frames of a video given the previous frames and the head poses of the previous and the next frames. In this work, we consider the TVS of dynamic scenes in which both the user and objects are moving. We design a framework that decouples the motion into user and object motion to effectively use the available user motion while predicting the next frames. We predict the motion of objects by isolating and estimating the 3D object motion in the past frames and then extrapolating it. We employ multi-plane images (MPI) as a 3D representation of the scenes and model the object motion as the 3D displacement between the corresponding points in the MPI representation. In order to handle the sparsity in MPIs while estimating the motion, we incorporate partial convolutions and masked correlation layers to estimate corresponding points. The predicted object motion is then integrated with the given user or camera motion to generate the next frame. Using a disocclusion infilling module, we synthesize the regions uncovered due to the camera and object motion. We develop a new synthetic dataset for TVS of dynamic scenes consisting of 800 videos at full HD resolution. We show through experiments on our dataset and the MPI Sintel dataset that our model outperforms all the competing methods in the literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题