论文标题

Hodge和Podge:混合监督的声音事件检测,具有多热混合和组成一致性训练

Hodge and Podge: Hybrid Supervised Sound Event Detection with Multi-Hot MixMatch and Composition Consistence Training

论文作者

Shi, Ziqiang, Liu, Liu, Lin, Huibin, Liu, Rujie

论文摘要

在本文中,我们提出了一种称为Hodge和Podge的方法,以进行声音事件检测。我们在声学场景和事件的检测和分类(DCASE)2019挑战任务4上演示了Hodge和Podge。此任务旨在预测家庭环境中声音事件的存在或不存在以及开始和抵消时间。声音事件检测由于缺乏大规模真正标记的数据而具有挑战性。最近,深度半监督学习(SSL)已被证明可以有效地使用弱标记和未标记的数据进行建模。这项工作探讨了如何扩展深SSL,以导致一种新的,最先进的声音事件检测方法,称为Hodge and Podge。引入并添加了卷积复发性神经网络(CRNN)作为骨干网络,首先,引入了多尺度的挤压激发机制来产生金字塔挤压激发CRNN。金字塔挤压兴趣层可以注意不同声音事件具有不同持续时间的问题,并适应频道频谱图的响应。此外,为了弥补缺乏真正标记的数据问题的缺乏,我们提出了使用时间频率增强的多热混合和组成一致性训练。我们对公共DCASE2019挑战任务4验证数据的实验导致了基于事件的F-评分为43.4 \%,并且在挑战中的最新方法比最先进的方法大约好1.6 \%。官方基线的F得分为25.8 \%。

In this paper, we propose a method called Hodge and Podge for sound event detection. We demonstrate Hodge and Podge on the dataset of Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Challenge Task 4. This task aims to predict the presence or absence and the onset and offset times of sound events in home environments. Sound event detection is challenging due to the lack of large scale real strongly labeled data. Recently deep semi-supervised learning (SSL) has proven to be effective in modeling with weakly labeled and unlabeled data. This work explores how to extend deep SSL to result in a new, state-of-the-art sound event detection method called Hodge and Podge. With convolutional recurrent neural networks (CRNN) as the backbone network, first, a multi-scale squeeze-excitation mechanism is introduced and added to generate a pyramid squeeze-excitation CRNN. The pyramid squeeze-excitation layer can pay attention to the issue that different sound events have different durations, and to adaptively recalibrate channel-wise spectrogram responses. Further, in order to remedy the lack of real strongly labeled data problem, we propose multi-hot MixMatch and composition consistency training with temporal-frequency augmentation. Our experiments with the public DCASE2019 challenge task 4 validation data resulted in an event-based F-score of 43.4\%, and is about absolutely 1.6\% better than state-of-the-art methods in the challenge. While the F-score of the official baseline is 25.8\%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源