可区分的词典搜索：将线性混合与深度非线性建模进行音频源分离

论文标题

可区分的词典搜索：将线性混合与深度非线性建模进行音频源分离

Differentiable Dictionary Search: Integrating Linear Mixing with Deep Non-Linear Modelling for Audio Source Separation

论文作者

Marták, Lukáš Samuel, Kelz, Rainer, Widmer, Gerhard

论文摘要

本文介绍了我们最近以可区分词典搜索（DDS）名称制定的新方法的一些改进。 DDS的基本思想是利用一类强大的深层可逆密度估计器称为标准化流量，以在线性分解方法（例如NMF）（例如NMF）中对字典进行建模，从而有效地在字典元素和相关的概率空间之间创建了双线，从而通过估计量构成了词典的搜索，并通过估计量的象征性搜索进行了不同的搜索。由于最初的表述是具有一些实际局限性的概念证明，因此我们将提出几个步骤，以使其可扩展，希望提高该方法的计算复杂性及其信号分解功能。作为实验评估的测试床，我们选择了框架级钢琴转录的任务，在该任务中，信号将被分解为活动归因于单个钢琴音符的来源。为了强调改进源的非线性建模的影响，我们将方法的变体与线性超过的NMF基线进行了比较。实验结果将表明，根据两项相关评估措施，即使没有其他约束，我们的模型也会产生越来越稀疏和精确的分解。

This paper describes several improvements to a new method for signal decomposition that we recently formulated under the name of Differentiable Dictionary Search (DDS). The fundamental idea of DDS is to exploit a class of powerful deep invertible density estimators called normalizing flows, to model the dictionary in a linear decomposition method such as NMF, effectively creating a bijection between the space of dictionary elements and the associated probability space, allowing a differentiable search through the dictionary space, guided by the estimated densities. As the initial formulation was a proof of concept with some practical limitations, we will present several steps towards making it scalable, hoping to improve both the computational complexity of the method and its signal decomposition capabilities. As a testbed for experimental evaluation, we choose the task of frame-level piano transcription, where the signal is to be decomposed into sources whose activity is attributed to individual piano notes. To highlight the impact of improved non-linear modelling of sources, we compare variants of our method to a linear overcomplete NMF baseline. Experimental results will show that even in the absence of additional constraints, our models produce increasingly sparse and precise decompositions, according to two pertinent evaluation measures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题