音乐源分离的多尺度时间频率关注

论文标题

音乐源分离的多尺度时间频率关注

Multi-scale temporal-frequency attention for music source separation

论文作者

Chen, Lianwu, Zheng, Xiguang, Zhang, Chen, Guo, Liang, Yu, Bing

论文摘要

近年来，基于深度神经网络（DNN）的方法已经达到了音乐源分离（MSS）的开始性能。尽管以前的方法已经使用各种方法解决了大型接受场建模，但没有明确探索MSS任务的音乐频谱图的时间和频率相关性。在本文中，提出了一个时间频率注意模块，以模拟沿时间和频率尺寸的光谱图相关性。此外，提出了多尺度的注意，以有效地捕获音乐信号的相关性。 MUSDB18数据集的实验结果表明，该提出的方法的表现优于分离声词茎的现有最先进的系统，具有9.51 dB信噪比（SDR），这是MSS的主要实际应用。

In recent years, deep neural networks (DNNs) based approaches have achieved the start-of-the-art performance for music source separation (MSS). Although previous methods have addressed the large receptive field modeling using various methods, the temporal and frequency correlations of the music spectrogram with repeated patterns have not been explicitly explored for the MSS task. In this paper, a temporal-frequency attention module is proposed to model the spectrogram correlations along both temporal and frequency dimensions. Moreover, a multi-scale attention is proposed to effectively capture the correlations for music signal. The experimental results on MUSDB18 dataset show that the proposed method outperforms the existing state-of-the-art systems with 9.51 dB signal-to-distortion ratio (SDR) on separating the vocal stems, which is the primary practical application of MSS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题