Vis2Mus：探索可控音乐的多模式表示映射

论文标题

Vis2Mus：探索可控音乐的多模式表示映射

Vis2Mus: Exploring Multimodal Representation Mapping for Controllable Music Generation

论文作者

Zhang, Runbang, Zhang, Yixiao, Shao, Kai, Shan, Ying, Xia, Gus

论文摘要

在这项研究中，我们探讨了从视觉艺术领域到音乐领域的表示形式，我们可以使用视觉艺术作为控制音乐发电的有效手柄。与纯粹是数据驱动的多模式表示学习中的大多数研究不同，我们采用了一种通过合成的分析方法，将深度音乐表示学习与用户研究结合在一起。这样的方法使我们能够发现\ textit {可解释}表示映射，而无需大量的配对数据。特别是，我们发现视觉到音乐的映射具有类似于ecurivariant的不错的属性。换句话说，我们可以使用各种图像转换，例如改变亮度，改变对比度，样式转移，以控制音乐域中的相应转换。此外，我们发布了Vis2Mus系统，作为符号音乐生成的可控接口。

In this study, we explore the representation mapping from the domain of visual arts to the domain of music, with which we can use visual arts as an effective handle to control music generation. Unlike most studies in multimodal representation learning that are purely data-driven, we adopt an analysis-by-synthesis approach that combines deep music representation learning with user studies. Such an approach enables us to discover \textit{interpretable} representation mapping without a huge amount of paired data. In particular, we discover that visual-to-music mapping has a nice property similar to equivariant. In other words, we can use various image transformations, say, changing brightness, changing contrast, style transfer, to control the corresponding transformations in the music domain. In addition, we released the Vis2Mus system as a controllable interface for symbolic music generation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题