peekaboo：图像扩散模型的文本是零拍的部分

论文标题

peekaboo：图像扩散模型的文本是零拍的部分

Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors

论文作者

Burgert, Ryan, Ranasinghe, Kanchana, Li, Xiang, Ryoo, Michael S.

论文摘要

最近，文本对图像扩散模型在从自然语言提示中创建逼真的图像方面表现出了显着的功能。但是，很少有使用这些模型进行语义定位或接地的作品。在这项工作中，我们探讨了未经培训的无局部本地化信息训练的现成的文本对图像扩散模型如何在不进行特定于分割的重新训练的情况下接地各种语义短语。我们引入了一个推理时间优化过程，能够生成以自然语言提示为条件的分割掩码。我们的建议是Peekaboo，是一种首次零射，开放式唱片，无监督的语义接地技术，利用扩散模型，而无需进行任何培训。我们在Pascal VOC数据集上评估了Peekaboo，以进行无监督的语义细分和用于参考细分的reccoco数据集，显示出竞争性竞争，并有希望的结果。我们还展示了如何使用Peekaboo来以透明度生成图像，即使基础扩散模型仅在RGB图像上训练 - 据我们所知，我们是第一个尝试的。请参阅我们的项目页面，包括我们的代码：https：//ryanndagreat.github.io/peekaboo

Recently, text-to-image diffusion models have shown remarkable capabilities in creating realistic images from natural language prompts. However, few works have explored using these models for semantic localization or grounding. In this work, we explore how an off-the-shelf text-to-image diffusion model, trained without exposure to localization information, can ground various semantic phrases without segmentation-specific re-training. We introduce an inference time optimization process capable of generating segmentation masks conditioned on natural language prompts. Our proposal, Peekaboo, is a first-of-its-kind zero-shot, open-vocabulary, unsupervised semantic grounding technique leveraging diffusion models without any training. We evaluate Peekaboo on the Pascal VOC dataset for unsupervised semantic segmentation and the RefCOCO dataset for referring segmentation, showing results competitive with promising results. We also demonstrate how Peekaboo can be used to generate images with transparency, even though the underlying diffusion model was only trained on RGB images - which to our knowledge we are the first to attempt. Please see our project page, including our code: https://ryanndagreat.github.io/peekaboo

下载PDF全文

下载文献需遵守相关版权规定

论文标题