几乎没有射击扩散模型

论文标题

几乎没有射击扩散模型

Few-Shot Diffusion Models

论文作者

Giannone, Giorgio, Nielsen, Didrik, Winther, Ole

论文摘要

降级扩散概率模型（DDPM）是功能强大的分层潜在变量模型，具有显着的样本生成质量和训练稳定性。这些属性可以归因于生成层次结构中的参数共享以及基于无参数扩散的推理过程。在本文中，我们提出了几个射击扩散模型（FSDM），这是一个框架，用于几乎没有发电的有条件DDPM的框架。通过使用基于设定的视觉变压器（VIT）汇总图像补丁信息，对FSDM进行了训练，以适应从给定类中的一小部分图像的生成过程。在测试时，该模型能够从以前看不见的类中生成样本，该类别的条件少于该类别的5个样本。我们从经验上表明，FSDM可以执行几次发电并将其传输到新数据集。我们在复杂的视觉数据集上基准了方法的变体，以进行几次学习，并与无条件和有条件的DDPM基准相比。此外，我们还展示了如何将模型调节基于补丁的输入集信息可以改善培训收敛。

Denoising diffusion probabilistic models (DDPM) are powerful hierarchical latent variable models with remarkable sample generation quality and training stability. These properties can be attributed to parameter sharing in the generative hierarchy, as well as a parameter-free diffusion-based inference procedure. In this paper, we present Few-Shot Diffusion Models (FSDM), a framework for few-shot generation leveraging conditional DDPMs. FSDMs are trained to adapt the generative process conditioned on a small set of images from a given class by aggregating image patch information using a set-based Vision Transformer (ViT). At test time, the model is able to generate samples from previously unseen classes conditioned on as few as 5 samples from that class. We empirically show that FSDM can perform few-shot generation and transfer to new datasets. We benchmark variants of our method on complex vision datasets for few-shot learning and compare to unconditional and conditional DDPM baselines. Additionally, we show how conditioning the model on patch-based input set information improves training convergence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题