Diffreamer：通过条件扩散模型，针对一致的无监督单视场景外推

论文标题

Diffreamer：通过条件扩散模型，针对一致的无监督单视场景外推

DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models

论文作者

Cai, Shengqu, Chan, Eric Ryan, Peng, Songyou, Shahbazi, Mohamad, Obukhov, Anton, Van Gool, Luc, Wetzstein, Gordon

论文摘要

场景外推 - 通过飞向给定的图像来产生新颖的观点的想法 - 是一项有希望但又具有挑战性的任务。对于每个预测的框架，必须解决一个关节介入和3D改进问题，这是不适合的，并且包括高水平的歧义。此外，很难获得长期场景的训练数据，通常缺乏足够的视图来推断准确的相机姿势。我们介绍了Dixfreamer，这是一个无监督的框架，能够综合新的视图，描绘了长时间的相机轨迹，同时仅在自然界的互联网收集图像上进行训练。利用指导的剥落步骤的随机性质，我们训练扩散模型以完善投影的RGBD图像，并在多个过去和将来的推理上进行降解步骤。我们证明，图像条件的扩散模型可以有效地执行远程外推，同时保持一致性明显优于先前的基于GAN的方法。 Diffreamer是一种强大而高效的解决方案，用于外推，尽管有限监督，但仍会产生令人印象深刻的结果。项目页面：https：//primecai.github.io/diffdreamer。

Scene extrapolation -- the idea of generating novel views by flying into a given image -- is a promising, yet challenging task. For each predicted frame, a joint inpainting and 3D refinement problem has to be solved, which is ill posed and includes a high level of ambiguity. Moreover, training data for long-range scenes is difficult to obtain and usually lacks sufficient views to infer accurate camera poses. We introduce DiffDreamer, an unsupervised framework capable of synthesizing novel views depicting a long camera trajectory while training solely on internet-collected images of nature scenes. Utilizing the stochastic nature of the guided denoising steps, we train the diffusion models to refine projected RGBD images but condition the denoising steps on multiple past and future frames for inference. We demonstrate that image-conditioned diffusion models can effectively perform long-range scene extrapolation while preserving consistency significantly better than prior GAN-based methods. DiffDreamer is a powerful and efficient solution for scene extrapolation, producing impressive results despite limited supervision. Project page: https://primecai.github.io/diffdreamer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题