论文标题
从音乐混合物中基于内容的歌声提取
Content Based Singing Voice Extraction From a Musical Mixture
论文作者
论文摘要
我们提出了一种基于深度学习的方法,用于根据基础语言内容从音乐混合物中提取唱歌语音信号。我们的模型遵循编码器解码器体系结构,并将其输入与人声的音乐混合物的频谱图的大小组成。模型的编码器部分是通过使用教师网络的知识蒸馏来培训的,以学习嵌入内容,该内容被解码以生成相应的Vocoder功能。使用这种方法,我们即使在训练过程中看不到的歌手,我们也能够从混合物中提取未加工的原始人声信号。尽管我们系统的性质使其与传统的客观评估指标不协调,但我们通过听力测试使用主观评估,将方法与最新的基于深度学习的源分离算法进行比较。我们还提供可重复性的良好示例和源代码。
We present a deep learning based methodology for extracting the singing voice signal from a musical mixture based on the underlying linguistic content. Our model follows an encoder decoder architecture and takes as input the magnitude component of the spectrogram of a musical mixture with vocals. The encoder part of the model is trained via knowledge distillation using a teacher network to learn a content embedding, which is decoded to generate the corresponding vocoder features. Using this methodology, we are able to extract the unprocessed raw vocal signal from the mixture even for a processed mixture dataset with singers not seen during training. While the nature of our system makes it incongruous with traditional objective evaluation metrics, we use subjective evaluation via listening tests to compare the methodology to state-of-the-art deep learning based source separation algorithms. We also provide sound examples and source code for reproducibility.