论文标题
积极学习声音事件检测
Active Learning for Sound Event Detection
论文作者
论文摘要
本文提出了一个主动学习系统,用于声音事件检测(SED)。它旨在以有限的注释工作来最大程度地提高学习的SED模型的准确性。提出的系统分析了最初未标记的音频数据集,从中选择声音段进行手动注释。候选段是基于提出的变更点检测方法生成的,并且选择基于不匹配优先最远的传播原理。在训练SED模型期间,记录被用作培训输入,并保留带注释的段的长期背景。所提出的系统显然优于用于评估的两个数据集中的参考方法(Tut Rare Sound 2017和Tau空间声音2019)。录音作为背景的培训优于只有带注释的细分市场的培训。基于随机采样和不确定性抽样的不匹配优先 - 最远的 - 最高 - 优于参考样品选择方法。值得注意的是,在目标声音事件很少见的数据集中,所需的注释工作可以大大减少:通过仅注释2%的培训数据,所达到的SED性能类似于注释所有培训数据。
This paper proposes an active learning system for sound event detection (SED). It aims at maximizing the accuracy of a learned SED model with limited annotation effort. The proposed system analyzes an initially unlabeled audio dataset, from which it selects sound segments for manual annotation. The candidate segments are generated based on a proposed change point detection approach, and the selection is based on the principle of mismatch-first farthest-traversal. During the training of SED models, recordings are used as training inputs, preserving the long-term context for annotated segments. The proposed system clearly outperforms reference methods in the two datasets used for evaluation (TUT Rare Sound 2017 and TAU Spatial Sound 2019). Training with recordings as context outperforms training with only annotated segments. Mismatch-first farthest-traversal outperforms reference sample selection methods based on random sampling and uncertainty sampling. Remarkably, the required annotation effort can be greatly reduced on the dataset where target sound events are rare: by annotating only 2% of the training data, the achieved SED performance is similar to annotating all the training data.