T-Peron-GAN：具有身份一致性和多种混音的文本形成图像生成

论文标题

T-Peron-GAN：具有身份一致性和多种混音的文本形成图像生成

T-Person-GAN: Text-to-Person Image Generation with Identity-Consistency and Manifold Mix-Up

论文作者

Liu, Deyin, Wu, Lin Yuanbo, Li, Bo, Ge, Zongyuan

论文摘要

在本文中，我们提出了一种端到端的方法，以生成仅根据文本来调节的高分辨率人图像。最先进的文本到图像生成模型主要设计用于中心对象，例如鲜花和鸟类。与具有相似形状和方向的中心位置对象不同，人形成是一项更具挑战性的任务，我们观察到以下内容：1）同一人的生成图像以身份相关性，例如身份相关的纹理/衣服/鞋子，以及图像跨图像，这些图像应构成良好的视觉范围，以抗震撼。为了应对上述挑战，我们开发了一种有效的生成模型，以产生具有两种新型机制的人图像。特别是，我们的第一个机制（称为t-persh-gan-id）是将单流生成器与身份保护性网络集成在一起，以便在其特征空间中规范生成数据的表示以确保身份矛盾。第二种机制（称为t-per-gan-id-mm）基于歧管混合物，通过来自不同歧管身份的生成图像的线性插值产生混合图像，我们进一步强制执行此类插值图像以在特征空间中进行线性分类。这相当于学习一个线性分类边界，该边界可以将图像与两个身份完美地分开。我们提出的方法在经验上得到了验证，以实现文本对人形成的显着改善。我们的体系结构与Stackgan ++是正交的，并专注于人形成，所有建筑都将所有这些组合在一起，以丰富图像生成任务的gans频谱。代码可在\ url {https://github.com/linwu-github/person-image-generation.git.git}上找到。

In this paper, we present an end-to-end approach to generate high-resolution person images conditioned on texts only. State-of-the-art text-to-image generation models are mainly designed for center-object generation, e.g., flowers and birds. Unlike center-placed objects with similar shapes and orientation, person image generation is a more challenging task, for which we observe the followings: 1) the generated images for the same person exhibit visual details with identity-consistency, e.g., identity-related textures/clothes/shoes across the images, and 2) those images should be discriminant for being robust against the inter-person variations caused by visual ambiguities. To address the above challenges, we develop an effective generative model to produce person images with two novel mechanisms. In particular, our first mechanism (called T-Person-GAN-ID) is to integrate the one-stream generator with an identity-preserving network such that the representations of generated data are regularized in their feature space to ensure the identity-consistency. The second mechanism (called T-Person-GAN-ID-MM) is based on the manifold mix-up to produce mixed images via the linear interpolation across generated images from different manifold identities, and we further enforce such interpolated images to be linearly classified in the feature space. This amounts to learning a linear classification boundary that can perfectly separate images from two identities. Our proposed method is empirically validated to achieve a remarkable improvement in text-to-person image generation. Our architecture is orthogonal to StackGAN++ , and focuses on person image generation, with all of them together to enrich the spectrum of GANs for the image generation task. Codes are available on \url{https://github.com/linwu-github/Person-Image-Generation.git}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题