矢量输送：通过抽象基于像素的扩散模型的文本到SVG

论文标题

矢量输送：通过抽象基于像素的扩散模型的文本到SVG

VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models

论文作者

Jain, Ajay, Xie, Amber, Abbeel, Pieter

论文摘要

扩散模型在文本对图像合成中显示出令人印象深刻的结果。扩散模型使用大量的字幕图像数据集学习生成高度多样的对象和场景的光栅图像。但是，设计人员经常使用用于数字图标或艺术的图像的矢量表示。向量图形可以缩放到任何大小，并且是紧凑的。我们表明，在图像的像素表示上训练的文本条件扩散模型可用于生成SVG启用的矢量图形。我们这样做，无需访问大型字幕SVG数据集。通过优化可区分的矢量图形栅格器，我们的方法，矢量注册将抽象的语义知识从验证的扩散模型中提取出来。受近期文本到3D工作的启发，我们学习了SVG与使用评分蒸馏采样的标题一致的SVG。为了加速产生并提高忠诚度，矢量流失也从图像样本初始化。实验比以前的工作显示出更高的质量，并展示了包括像素艺术和草图在内的一系列样式。请参阅我们的项目网页，网址为https://ajayj.com/vectorfusion。

Diffusion models have shown impressive results in text-to-image synthesis. Using massive datasets of captioned images, diffusion models learn to generate raster images of highly diverse objects and scenes. However, designers frequently use vector representations of images like Scalable Vector Graphics (SVGs) for digital icons or art. Vector graphics can be scaled to any size, and are compact. We show that a text-conditioned diffusion model trained on pixel representations of images can be used to generate SVG-exportable vector graphics. We do so without access to large datasets of captioned SVGs. By optimizing a differentiable vector graphics rasterizer, our method, VectorFusion, distills abstract semantic knowledge out of a pretrained diffusion model. Inspired by recent text-to-3D work, we learn an SVG consistent with a caption using Score Distillation Sampling. To accelerate generation and improve fidelity, VectorFusion also initializes from an image sample. Experiments show greater quality than prior work, and demonstrate a range of styles including pixel art and sketches. See our project webpage at https://ajayj.com/vectorfusion .

下载PDF全文

下载文献需遵守相关版权规定

论文标题