如何（实际上）培训您的扬声器本地化

论文标题

如何（实际上）培训您的扬声器本地化

How to (virtually) train your speaker localizer

论文作者

Srivastava, Prerak, Deleforge, Antoine, Politis, Archontis, Vincent, Emmanuel

论文摘要

基于学习的方法在扬声器本地化中已变得无处不在。现有系统依赖于模拟培训集，因为缺乏足够大，多样和注释的真实数据集。用于此目的的大多数房间声学模拟器由于其计算效率而依赖于图像源方法（ISM）。本文认为，仔细扩展ISM以将更现实的表面，源和麦克风响应纳入训练集中可以显着提高说话者本地化系统的现实性能。结果表明，增加最先进的排序估算器的训练集现实主义在三个不同的真实测试集中，以各种房间和各种麦克风阵列中的人类说明器中的三个不同的实际测试集得到一致的改进。一项消融研究进一步揭示了每一个现实主义的每一层都对这些改进产生了积极的贡献。

Learning-based methods have become ubiquitous in speaker localization. Existing systems rely on simulated training sets for the lack of sufficiently large, diverse and annotated real datasets. Most room acoustics simulators used for this purpose rely on the image source method (ISM) because of its computational efficiency. This paper argues that carefully extending the ISM to incorporate more realistic surface, source and microphone responses into training sets can significantly boost the real-world performance of speaker localization systems. It is shown that increasing the training-set realism of a state-of-the-art direction-of-arrival estimator yields consistent improvements across three different real test sets featuring human speakers in a variety of rooms and various microphone arrays. An ablation study further reveals that every added layer of realism contributes positively to these improvements.

下载PDF全文

下载文献需遵守相关版权规定

论文标题