论文标题
使用Squeeze-Excitation残留CNN的声音事件定位和检测
Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs
论文作者
论文摘要
声音事件的定位和检测(SELD)是与机器聆听场有关的问题,其目标是识别单个声音事件,检测其时间活动并估算其空间位置。由于出现了更硬标记的音频数据集,深度学习技术已成为最先进的解决方案。最常见的是那些实现卷积复发网络(CRNN)的人,此前曾将音频信号转换为多通道2D表示。挤压激发技术可以被视为一种卷积增强,旨在独立学习空间和通道特征地图,而不是像标准卷积一样一起学习。这通常是通过组合一些全局聚类运算符,线性运算符以及块输入与其学习关系之间的最终校准来实现的。这项工作旨在通过在CRNN的卷积部分中添加残留的挤压激发(SE)块来提高Dcase 2020 Task 3中提出的基线CRNN的准确性结果。遵循的过程涉及对残差SE块的比率参数(用于线性关系)的网格搜索,而网络的超参数保持与基线相同。实验表明,通过简单地引入残留SE块,获得的结果就可以大大改善基线。
Sound Event Localization and Detection (SELD) is a problem related to the field of machine listening whose objective is to recognize individual sound events, detect their temporal activity, and estimate their spatial location. Thanks to the emergence of more hard-labeled audio datasets, deep learning techniques have become state-of-the-art solutions. The most common ones are those that implement a convolutional recurrent network (CRNN) having previously transformed the audio signal into multichannel 2D representation. The squeeze-excitation technique can be considered as a convolution enhancement that aims to learn spatial and channel feature maps independently rather than together as standard convolutions do. This is usually achieved by combining some global clustering operators, linear operators and a final calibration between the block input and its learned relationships. This work aims to improve the accuracy results of the baseline CRNN presented in DCASE 2020 Task 3 by adding residual squeeze-excitation (SE) blocks in the convolutional part of the CRNN. The followed procedure involves a grid search of the ratio parameter (used in the linear relationships) of the residual SE block, whereas the hyperparameters of the network remain the same as in the baseline. Experiments show that by simply introducing the residual SE blocks, the results obtained are able to improve the baseline considerably.