序词vae：深层可变模型中的序数值含量因素

论文标题

序词vae：深层可变模型中的序数值含量因素

Ordinal-Content VAE: Isolating Ordinal-Valued Content Factors in Deep Latent Variable Models

论文作者

Kim, Minyoung, Pavlovic, Vladimir

论文摘要

在深度代表性学习中，通常希望将特定因素（称为{\ em content}）与其他因素（称为{\ em style}）隔离。用户通常通过数据中的明确标签指定内容的内容，而所有未标记/未知因素被视为样式。最近，已经表明，可以通过修改深层因子模型（例如VAE）来有效利用这种内容标记的数据，从而使样式和内容在潜在表示中很好地分开。但是，该方法假定内容因素是分类值（例如，面部图像数据中的主题ID或MNIST数据集中的数字类别）。在某些情况下，内容是序数价值的，也就是说，内容因子所采用的值是{\ em corded}而不是分类，使内容标记为标记的VAE，包括他们推断的潜在空间，次优。在本文中，我们提出了VAE的新型扩展，该扩展在内容潜在空间中施加了部分有序的集合（POSET）结构，同时使其与顺序内容值保持一致。为此，我们引入了有条件的高斯间距先验模型，而不是先前的方法中采用的IID高斯潜在事先。该模型承认可拖动的关节高斯先验，但也有效地将可忽略不计的密度值将违反POSET约束的内容潜在配置置于。为了评估该模型，我们考虑了两个特定的序列结构问题：估计受试者在面部图像中的年龄，并阐明食品粉状图像中的卡路里量。我们证明了与以前的非外界方法相比，内容式分离的显着改善。

In deep representational learning, it is often desired to isolate a particular factor (termed {\em content}) from other factors (referred to as {\em style}). What constitutes the content is typically specified by users through explicit labels in the data, while all unlabeled/unknown factors are regarded as style. Recently, it has been shown that such content-labeled data can be effectively exploited by modifying the deep latent factor models (e.g., VAE) such that the style and content are well separated in the latent representations. However, the approach assumes that the content factor is categorical-valued (e.g., subject ID in face image data, or digit class in the MNIST dataset). In certain situations, the content is ordinal-valued, that is, the values the content factor takes are {\em ordered} rather than categorical, making content-labeled VAEs, including the latent space they infer, suboptimal. In this paper, we propose a novel extension of VAE that imposes a partially ordered set (poset) structure in the content latent space, while simultaneously making it aligned with the ordinal content values. To this end, instead of the iid Gaussian latent prior adopted in prior approaches, we introduce a conditional Gaussian spacing prior model. This model admits a tractable joint Gaussian prior, but also effectively places negligible density values on the content latent configurations that violate the poset constraint. To evaluate this model, we consider two specific ordinal structured problems: estimating a subject's age in a face image and elucidating the calorie amount in a food meal image. We demonstrate significant improvements in content-style separation over previous non-ordinal approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题