gr-gan：逐渐完善的文本到图像生成

论文标题

gr-gan：逐渐完善的文本到图像生成

GR-GAN: Gradual Refinement Text-to-image Generation

论文作者

Yang, Bo, Feng, Fangxiang, Wang, Xiaojie

论文摘要

良好的文本对图像模型不仅应生成高质量的图像，而且还应确保文本和生成图像之间的一致性。以前的型号无法同时固定双方。本文提出了一个逐步的细化生成对抗网络（GR-GAN），以有效地减轻问题。 GRG模块旨在生成从低分辨率到高分辨率的图像，并具有相应的文本约束，从粗粒度（句子）到精细的粒度（单词）阶段，逐个阶段，ITM模块旨在在句子图像级别和词汇水平上提供图像文本匹配损失，以提供相应阶段的文字级别。我们还引入了一个新的度量跨模型距离（CMD），以同时评估图像质量和图像文本一致性。实验结果表明，GR-GAN显着的优于以前的模型，并在FID和CMD上实现了新的最新模型。详细的分析证明了GR-GAN不同产生阶段的效率。

A good Text-to-Image model should not only generate high quality images, but also ensure the consistency between the text and the generated image. Previous models failed to simultaneously fix both sides well. This paper proposes a Gradual Refinement Generative Adversarial Network (GR-GAN) to alleviates the problem efficiently. A GRG module is designed to generate images from low resolution to high resolution with the corresponding text constraints from coarse granularity (sentence) to fine granularity (word) stage by stage, a ITM module is designed to provide image-text matching losses at both sentence-image level and word-region level for corresponding stages. We also introduce a new metric Cross-Model Distance (CMD) for simultaneously evaluating image quality and image-text consistency. Experimental results show GR-GAN significant outperform previous models, and achieve new state-of-the-art on both FID and CMD. A detailed analysis demonstrates the efficiency of different generation stages in GR-GAN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题