Resiveyle：编码强大式倒置的兴趣区域

论文标题

Resiveyle：编码强大式倒置的兴趣区域

IntereStyle: Encoding an Interest Region for Robust StyleGAN Inversion

论文作者

Moon, Seungjun, Park, Gyeong-Moon

论文摘要

最近，对现实世界图像的操纵与生成对抗网络（GAN）和相应的编码器的开发一起被高度详细阐述，它们将真实世界图像嵌入到潜在的空间中。但是，由于失真和感知之间的权衡，GAN的设计编码器仍然是一项具有挑战性的任务。在本文中，我们指出，现有的编码器不仅试图降低兴趣区域的失真，例如人的面部地区，而且在不感兴趣的地区，例如背景模式和障碍。但是，实际图像中的大多数不感兴趣区域都位于分布（OOD）上，这是理想地通过生成模型重建的。此外，我们从经验上发现，与兴趣区域重叠的不感兴趣的区域可以构成兴趣区域的原始特征，例如，与面部区域重叠的麦克风被倒入白胡子中。结果，在保持感知质量的同时降低整个图像的失真非常具有挑战性。为了克服这一权衡，我们提出了一个简单而有效的编码培训计划，即创造了纪录片，该计划通过关注兴趣区域来促进编码。 Resiveyle引导编码器解开兴趣和不感兴趣区域的编码。为此，我们迭代地过滤了不感兴趣的区域的信息，以调节不感兴趣的区域的负面影响。我们证明，与现有的最新编码器相比，Resiveyle可以实现较低的失真和更高的感知质量。尤其是，我们的模型可稳固地保守原始图像的特征，该图像显示了强大的图像编辑和样式混合结果。审查后，我们将使用预训练的模型发布代码。

Recently, manipulation of real-world images has been highly elaborated along with the development of Generative Adversarial Networks (GANs) and corresponding encoders, which embed real-world images into the latent space. However, designing encoders of GAN still remains a challenging task due to the trade-off between distortion and perception. In this paper, we point out that the existing encoders try to lower the distortion not only on the interest region, e.g., human facial region but also on the uninterest region, e.g., background patterns and obstacles. However, most uninterest regions in real-world images are located at out-of-distribution (OOD), which are infeasible to be ideally reconstructed by generative models. Moreover, we empirically find that the uninterest region overlapped with the interest region can mangle the original feature of the interest region, e.g., a microphone overlapped with a facial region is inverted into the white beard. As a result, lowering the distortion of the whole image while maintaining the perceptual quality is very challenging. To overcome this trade-off, we propose a simple yet effective encoder training scheme, coined IntereStyle, which facilitates encoding by focusing on the interest region. IntereStyle steers the encoder to disentangle the encodings of the interest and uninterest regions. To this end, we filter the information of the uninterest region iteratively to regulate the negative impact of the uninterest region. We demonstrate that IntereStyle achieves both lower distortion and higher perceptual quality compared to the existing state-of-the-art encoders. Especially, our model robustly conserves features of the original images, which shows the robust image editing and style mixing results. We will release our code with the pre-trained model after the review.

下载PDF全文

下载文献需遵守相关版权规定

论文标题