解开各种自动编码器

论文标题

解开各种自动编码器

Disentangling Variational Autoencoders

论文作者

Pastrana, Rafael

论文摘要

差异自动编码器（VAE）是后推理的概率机器学习框架，该框架将一组高维数据投入到较低维度的潜在空间。使用VAE学到的潜在空间为开发新的数据驱动的设计过程提供了令人兴奋的机会，特别是在美学上使人联想到输入数据，但在培训期间看不见。但是，学到的潜在空间通常会混乱和纠缠：沿单个维度穿越潜在空间不会导致数据的单个视觉属性的变化。缺乏潜在的结构阻碍了设计师故意控制潜在空间产生的新设计的视觉属性。本文提出了一项实验研究，研究了潜在的空间分离。我们从文献中实施了三种不同的VAE模型，并在60,000张手写数字图像的公开数据集上训练它们。我们执行灵敏度分析，以找到最大程度地限制数据的下限所需的潜在维度。此外，我们研究了解码图像的重建质量与潜在空间的解开水平之间的权衡。我们能够自动将三个潜在维度与数字的三个可解释的视觉属性保持一致：线重，倾斜和宽度。我们的实验表明，i）增加了库尔贝克·莱布尔（Kullback-Leibler）在潜伏期上的差异和对证据的变异分布之间的贡献，ii）ii）调节输入图像类别可以增强与VAE的分离潜在空间的学习。

A variational autoencoder (VAE) is a probabilistic machine learning framework for posterior inference that projects an input set of high-dimensional data to a lower-dimensional, latent space. The latent space learned with a VAE offers exciting opportunities to develop new data-driven design processes in creative disciplines, in particular, to automate the generation of multiple novel designs that are aesthetically reminiscent of the input data but that were unseen during training. However, the learned latent space is typically disorganized and entangled: traversing the latent space along a single dimension does not result in changes to single visual attributes of the data. The lack of latent structure impedes designers from deliberately controlling the visual attributes of new designs generated from the latent space. This paper presents an experimental study that investigates latent space disentanglement. We implement three different VAE models from the literature and train them on a publicly available dataset of 60,000 images of hand-written digits. We perform a sensitivity analysis to find a small number of latent dimensions necessary to maximize a lower bound to the log marginal likelihood of the data. Furthermore, we investigate the trade-offs between the quality of the reconstruction of the decoded images and the level of disentanglement of the latent space. We are able to automatically align three latent dimensions with three interpretable visual properties of the digits: line weight, tilt and width. Our experiments suggest that i) increasing the contribution of the Kullback-Leibler divergence between the prior over the latents and the variational distribution to the evidence lower bound, and ii) conditioning input image class enhances the learning of a disentangled latent space with a VAE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题