论文标题
渲染限制:3D重建,内部和生成的图像扩散
RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation
论文作者
论文摘要
扩散模型当前实现有条件图像和无条件图像的最新性能。但是,到目前为止,图像扩散模型不支持3D理解所需的任务,例如符合3D生成或单视对象重建。在本文中,我们提出了渲染量,这是仅使用单眼2D监督训练的3D生成和推理的第一个扩散模型。我们方法的核心是一种新颖的图像降级架构,该体系结构生成并在每个DeNoising步骤中的场景中具有中间的三维表示。这在扩散过程中实施了强大的归纳结构,提供了3D一致的表示,而仅需要2D监督。可以从任何角度呈现产生的3D表示形式。我们评估了FFHQ,AFHQ,Shapenet和CleVR数据集的渲染点,显示了生成3D场景的竞争性能以及从2D图像中的3D场景推断。此外,我们基于扩散的方法使我们能够使用2D介入来编辑3D场景。
Diffusion models currently achieve state-of-the-art performance for both conditional and unconditional image generation. However, so far, image diffusion models do not support tasks required for 3D understanding, such as view-consistent 3D generation or single-view object reconstruction. In this paper, we present RenderDiffusion, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision. Central to our method is a novel image denoising architecture that generates and renders an intermediate three-dimensional representation of a scene in each denoising step. This enforces a strong inductive structure within the diffusion process, providing a 3D consistent representation while only requiring 2D supervision. The resulting 3D representation can be rendered from any view. We evaluate RenderDiffusion on FFHQ, AFHQ, ShapeNet and CLEVR datasets, showing competitive performance for generation of 3D scenes and inference of 3D scenes from 2D images. Additionally, our diffusion-based approach allows us to use 2D inpainting to edit 3D scenes.