六度声学代表的双重四元雄鹿阵列

论文标题

六度声学代表的双重四元雄鹿阵列

Dual Quaternion Ambisonics Array for Six-Degree-of-Freedom Acoustic Representation

论文作者

Grassucci, Eleonora, Mancini, Gioia, Brignone, Christian, Uncini, Aurelio, Comminiello, Danilo

论文摘要

由于沉浸式音频体验和应用（例如虚拟和增强现实）的传播，空间音频方法的兴趣日益增长。为了这些目的，通常通过一系列Ambisonics麦克风获得3D音频信号，每个音频信号包括四个将声场分解为球形谐波中的胶囊。在本文中，我们提出了通过两个一阶Ambisonics（FOA）麦克风阵列获得的空间声场的双重四元化表示。音频信号被封装在双重四个四个方面，即利用四元素代数属性来利用它们之间的相关性。这种具有6个自由度（6DOF）的增强代表性涉及对声场的更准确覆盖，从而导致更精确的声音定位和更身临其境的音频体验。我们在声音事件的本地化和检测（SELD）基准中评估了我们的方法。我们表明，具有时间卷积块（dualqseld-TCN）的双重四基因SELD模型，由于我们对声场的增强表示，因此在真实和四基因价值的基线方面取得了更好的结果。完整代码可在以下网址找到：https：//github.com/ispamm/dualqseld-tcn。

Spatial audio methods are gaining a growing interest due to the spread of immersive audio experiences and applications, such as virtual and augmented reality. For these purposes, 3D audio signals are often acquired through arrays of Ambisonics microphones, each comprising four capsules that decompose the sound field in spherical harmonics. In this paper, we propose a dual quaternion representation of the spatial sound field acquired through an array of two First Order Ambisonics (FOA) microphones. The audio signals are encapsulated in a dual quaternion that leverages quaternion algebra properties to exploit correlations among them. This augmented representation with 6 degrees of freedom (6DOF) involves a more accurate coverage of the sound field, resulting in a more precise sound localization and a more immersive audio experience. We evaluate our approach on a sound event localization and detection (SELD) benchmark. We show that our dual quaternion SELD model with temporal convolution blocks (DualQSELD-TCN) achieves better results with respect to real and quaternion-valued baselines thanks to our augmented representation of the sound field. Full code is available at: https://github.com/ispamm/DualQSELD-TCN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题