论文标题
深度可分离的卷积重新连接,挤压和兴奋的块,用于小英寸关键字。
Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-footprint Keyword Spotting
论文作者
论文摘要
关键字发现的一个困难问题是如何在保持高精度的同时将其内存足迹微型化。尽管卷积神经网络已证明对小英尺打印的关键字发现问题有效,但他们仍然需要数十万个参数才能实现良好的性能。在本文中,我们提出了一个有效的模型,该模型基于可分开的卷积层和挤压和激发块。具体而言,我们用深度可分离卷积代替了标准卷积,从而减少了标准卷积的参数的数量而没有明显的性能降解。我们通过使用所谓的挤压和兴奋块重新加权第一卷积层的输出特征图,进一步提高了深度可分离卷积的性能。我们将提出的方法与Google语音命令数据集的两个实验设置上的五个代表性模型进行了比较。实验结果表明,所提出的方法实现了最新的性能。例如,它在第一个实验中以72K的许多参数达到了分类错误率为3.29%,这在给定相似的模型大小的情况下大大优于比较方法。它的错误率为3.97%,其中许多参数为10K,在给定模型大小的情况下,它也比最新的比较方法稍好一些。
One difficult problem of keyword spotting is how to miniaturize its memory footprint while maintain a high precision. Although convolutional neural networks have shown to be effective to the small-footprint keyword spotting problem, they still need hundreds of thousands of parameters to achieve good performance. In this paper, we propose an efficient model based on depthwise separable convolution layers and squeeze-and-excitation blocks. Specifically, we replace the standard convolution by the depthwise separable convolution, which reduces the number of the parameters of the standard convolution without significant performance degradation. We further improve the performance of the depthwise separable convolution by reweighting the output feature maps of the first convolution layer with a so-called squeeze-and-excitation block. We compared the proposed method with five representative models on two experimental settings of the Google Speech Commands dataset. Experimental results show that the proposed method achieves the state-of-the-art performance. For example, it achieves a classification error rate of 3.29% with a number of parameters of 72K in the first experiment, which significantly outperforms the comparison methods given a similar model size. It achieves an error rate of 3.97% with a number of parameters of 10K, which is also slightly better than the state-of-the-art comparison method given a similar model size.