基于声音强度向量通过基于DNN的DeNoising和源分离来完善的声音事件本地化

论文标题

基于声音强度向量通过基于DNN的DeNoising和源分离来完善的声音事件本地化

Sound Event Localization based on Sound Intensity Vector Refined By DNN-Based Denoising and Source Separation

论文作者

Yasuda, Masahiro, Koizumi, Yuma, Saito, Shoichiro, Uematsu, Hisashi, Imoto, Keisuke

论文摘要

我们提出了一种用于声音事件定位和检测（SELD）的排序方向（DOA）估计方法。直接使用深神经网络（DNN）（即完全传说的方法）直接估算DOA，可实现高精度。但是，单个和重叠来源的DOA估计之间的准确性存在差距，因为它们无法纳入物理知识。同时，尽管基于物理学的方法的准确性不如基于DNN的方法，但对于重叠源是可靠的。在这项研究中，我们考虑了基于物理和基于DNN的方法的组合。基于物理的DOA估计的声音强度向量（IVS）是基于基于DNN的DeNoising和源分离来完善的。该方法可以使用球形麦克风阵列对单个和重叠源进行准确的DOA估计。实验结果表明，所提出的方法在SELD的开放数据集上实现了最先进的DOA估计精度。

We propose a direction-of-arrival (DOA) estimation method for Sound Event Localization and Detection (SELD). Direct estimation of DOA using a deep neural network (DNN), i.e. completely-datadriven approach, achieves high accuracy. However, there is a gap in the accuracy between DOA estimation for single and overlapping sources because they cannot incorporate physical knowledge. Meanwhile, although the accuracy of physics-based approaches is inferior to DNN-based approaches, it is robust for overlapping source. In this study, we consider a combination of physics-based and DNN-based approaches; the sound intensity vectors (IVs) for physics-based DOA estimation is refined based on DNN-based denoising and source separation. This method enables the accurate DOA estimation for both single and overlapping sources using a spherical microphone array. Experimental results show that the proposed method achieves state-of-the-art DOA estimation accuracy on an open dataset of the SELD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题