distositionet：语义图像操纵中解开的姿势和身份

论文标题

distositionet：语义图像操纵中解开的姿势和身份

DisPositioNet: Disentangled Pose and Identity in Semantic Image Manipulation

论文作者

Farshad, Azade, Yeganeh, Yousef, Dhamo, Helisa, Tombari, Federico, Navab, Nassir

论文摘要

对象的图表及其在场景中的关系（称为场景图）通过修改图表中的节点或边缘来操纵场景，提供了一个精确且可见的接口。尽管现有作品在修改对象的位置和姿势方面表现出了令人鼓舞的结果，但场景操纵通常会导致失去某些视觉特征，例如对象的外观或身份。在这项工作中，我们提出了dispositionet，该模型以自我监督的方式使用场景图来学习每个对象的分离表示图像操作的任务。我们的框架使分离差异潜在嵌入以及图中的特征表示形式。除了由于姿势和身份等特征的分解而产生更真实的图像外，我们的方法还利用了中间特征中的概率采样，以在对象更换或添加任务中生成更多样化的图像。我们的实验结果表明，在模型的潜在歧管中删除特征表示形式，在两个公共基准上定性和定量上优于先前的作品。项目页面：https：//scenegenie.github.io/dispositionet/

Graph representation of objects and their relations in a scene, known as a scene graph, provides a precise and discernible interface to manipulate a scene by modifying the nodes or the edges in the graph. Although existing works have shown promising results in modifying the placement and pose of objects, scene manipulation often leads to losing some visual characteristics like the appearance or identity of objects. In this work, we propose DisPositioNet, a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs in a self-supervised manner. Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph. In addition to producing more realistic images due to the decomposition of features like pose and identity, our method takes advantage of the probabilistic sampling in the intermediate features to generate more diverse images in object replacement or addition tasks. The results of our experiments show that disentangling the feature representations in the latent manifold of the model outperforms the previous works qualitatively and quantitatively on two public benchmarks. Project Page: https://scenegenie.github.io/DispositioNet/

下载PDF全文

下载文献需遵守相关版权规定

论文标题