学习框架水平的关注环境声音分类

论文标题

学习框架水平的关注环境声音分类

Learning Frame Level Attention for Environmental Sound Classification

论文作者

Zhang, Zhichao, Xu, Shugong, Zhang, Shunqing, Qiao, Tianhao, Cao, Shan

论文摘要

由于声音的复杂性，环境声音分类（ESC）是一个具有挑战性的问题。分类性能在很大程度上取决于从环境声音中提取的代表性特征的有效性。但是，ESC经常遭受语义上无关的框架和无声框架的折磨。为了解决这个问题，我们采用框架级的注意模型来专注于语义相关的框架和显着框架。具体而言，我们首先提出了一个卷积复发性神经网络，以学习光谱时间特征和时间相关性。然后，我们使用框架级的注意机制扩展了卷积RNN模型，以学习ESC的判别特征表示。我们在使用不同的注意力缩放函数并应用不同层时研究了分类性能。在ESC-50和ESC-10数据集上进行了实验。实验结果证明了所提出的方法的有效性，我们的方法以较低的计算复杂性达到了最新或竞争性的分类精度。我们还可以看到我们的注意力结果，并观察到所提出的注意机制能够在环境声音的语义相关部分引导网络tofocus。

Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The classification performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the semantically relevant frames and salient frames. Specifically, we first propose a convolutional recurrent neural network to learn spectro-temporal features and temporal correlations. Then, we extend our convolutional RNN model with a frame-level attention mechanism to learn discriminative feature representations for ESC. We investigated the classification performance when using different attention scaling function and applying different layers. Experiments were conducted on ESC-50 and ESC-10 datasets. Experimental results demonstrated the effectiveness of the proposed method and our method achieved the state-of-the-art or competitive classification accuracy with lower computational complexity. We also visualized our attention results and observed that the proposed attention mechanism was able to lead the network tofocus on the semantically relevant parts of environmental sounds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题