论文标题
使用深层相关性,在基于CNN的声学定位中朝着域独立性
Towards Domain Independence in CNN-based Acoustic Localization using Deep Cross Correlations
论文作者
论文摘要
时间延迟估计在声源定位(ASL)系统中至关重要。为此目的,最常用的技术之一是一对信号之间的广义跨相关性(GCC)及其在转向响应能力(SRP)技术中的使用,该技术估计了特定位置的声学能力。如今,深度学习策略可能胜过这些方法。但是,它们通常取决于训练阶段中可用的几何和传感器构型条件,因此,如果不进行重新训练或适应,则在面对新环境时具有有限的概括能力。在这项工作中,我们提出了一种基于能够胜过众所周知的SRP-PHAT算法的编码器CNN体系结构,以及在不匹配不匹配的培训测试条件的情况下而无需重新培训的情况下,还可以使用其他深度学习策略。我们的建议旨在估算相关信号的平滑版本,然后将其用于生成精致的声学功率图,从而在ASL任务上提供更好的性能。我们的实验评估使用了三个公开可用的现实数据集,并提供了与SRP-PHAT算法和其他基于深度学习的最新建议进行比较。
Time delay estimation is essential in Acoustic Source Localization (ASL) systems. One of the most used techniques for this purpose is the Generalized Cross Correlation (GCC) between a pair of signals and its use in Steered Response Power (SRP) techniques, which estimate the acoustic power at a specific location. Nowadays, Deep Learning strategies may outperform these methods. However, they are generally dependent on the geometric and sensor configuration conditions that are available during the training phases, thus having limited generalization capabilities when facing new environments if no re-training nor adaptation is applied. In this work, we propose a method based on an encoder-decoder CNN architecture capable of outperforming the well known SRP-PHAT algorithm, and also other Deep Learning strategies when working in mismatched training-testing conditions without requiring a model re-training. Our proposal aims to estimate a smoothed version of the correlation signals, that is then used to generate a refined acoustic power map, which leads to better performance on the ASL task. Our experimental evaluation uses three publicly available realistic datasets and provides a comparison with the SRP-PHAT algorithm and other recent proposals based on Deep Learning.