论文标题
从声音表示到模型鲁棒性
From Sound Representation to Model Robustness
论文作者
论文摘要
在本文中,我们研究了不同标准的环境声音表示(频谱图)对受害者残留卷积神经网络的识别性能和对抗性攻击的鲁棒性的影响。在三个基准测试环境声音数据集上进行的各种实验平均,我们发现Resnet-18模型在分类准确性和培训参数的数量方面都优于其他深度学习体系结构,例如Googlenet和Alexnet。因此,我们将此模型设置为我们的前端分类器,以进行后续研究。在此,我们测量了产生更有信息的MEL频率Cepstral系数(MFCC),短时傅立叶变换(STFT)和离散小波变换(DWT)表示所需的不同设置的影响。该测量涉及比较对抗性鲁棒性的分类性能。在对手分配的平均预算和攻击成本的平均值上,我们证明了识别精度与六种攻击算法的模型鲁棒性之间的反比关系。此外,我们的实验结果表明,尽管在DWT光谱图上训练的RESNET-18模型达到了最高的识别精度,但与其他2D表示相比,攻击该模型对于对手来说相对较高。
In this paper, we investigate the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network. Averaged over various experiments on three benchmarking environmental sound datasets, we found the ResNet-18 model outperforms other deep learning architectures such as GoogLeNet and AlexNet both in terms of classification accuracy and the number of training parameters. Therefore we set this model as our front-end classifier for subsequent investigations. Herein, we measure the impact of different settings required for generating more informative mel-frequency cepstral coefficient (MFCC), short-time Fourier transform (STFT), and discrete wavelet transform (DWT) representations on our front-end model. This measurement involves comparing the classification performance over the adversarial robustness. On the balance of average budgets allocated by adversary and the cost of attack, we demonstrate an inverse relationship between recognition accuracy and model robustness against six attack algorithms. Moreover, our experimental results show that while the ResNet-18 model trained on DWT spectrograms achieves the highest recognition accuracy, attacking this model is relatively more costly for the adversary compared to other 2D representations.