MedleyVox：用于多个歌声分离的评估数据集

论文标题

MedleyVox：用于多个歌声分离的评估数据集

MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation

论文作者

Jeon, Chang-Bin, Moon, Hyeongi, Choi, Keunwoo, Chon, Ben Sangbae, Lee, Kyogu

论文摘要

在音乐源分离研究中，多种唱歌声音分离为每个声音是一个很少研究的领域。缺乏基准数据集阻碍了其进度。在本文中，我们提出了一个评估数据集，并为多种歌声分离提供了基线研究。首先，我们介绍了MedleyVox，这是一个评估数据集，用于多种歌声分离。我们通过将其分类为i）UNISON，ii）二重奏，iii）主要与休息以及iv）n-sing分离来指定问题定义。其次，为了克服用于培训目的的缺乏现有的多单调数据集，我们提出了一种使用各种单个单调数据集构建多种唱歌混合物的策略。第三，我们提出了改进的超分辨率网络（ISRNET），该网络大大增强了分离网络的初始估计。拟议的ISRNET与Conv-TASNET和多节奏混合物构建策略共同训练，其性能与MedleyVox的二重奏和Unison子集的理想时间频面面具相当。音频样本，数据集和代码可在我们的网站（https://github.com/jeonchangbin49/medleyvox）上找到。

Separation of multiple singing voices into each voice is a rarely studied area in music source separation research. The absence of a benchmark dataset has hindered its progress. In this paper, we present an evaluation dataset and provide baseline studies for multiple singing voices separation. First, we introduce MedleyVox, an evaluation dataset for multiple singing voices separation. We specify the problem definition in this dataset by categorizing it into i) unison, ii) duet, iii) main vs. rest, and iv) N-singing separation. Second, to overcome the absence of existing multi-singing datasets for a training purpose, we present a strategy for construction of multiple singing mixtures using various single-singing datasets. Third, we propose the improved super-resolution network (iSRNet), which greatly enhances initial estimates of separation networks. Jointly trained with the Conv-TasNet and the multi-singing mixture construction strategy, the proposed iSRNet achieved comparable performance to ideal time-frequency masks on duet and unison subsets of MedleyVox. Audio samples, the dataset, and codes are available on our website (https://github.com/jeonchangbin49/MedleyVox).

下载PDF全文

下载文献需遵守相关版权规定

论文标题