MatchBoxNet：1d时间频道可分离的卷积神经网络架构语音命令识别

论文标题

MatchBoxNet：1d时间频道可分离的卷积神经网络架构语音命令识别

MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition

论文作者

Majumdar, Somshubra, Ginsburg, Boris

论文摘要

我们提出了一个MatchBoxNet-用于语音命令识别的端到端神经网络。 MatchBoxNet是一个深层残留网络，由1D时间通道可分离卷积，批量差正规化，relu和辍学层组成。 MatchBoxNet在Google Speech命令数据集上达到最先进的准确性，而参数明显少于类似模型。 MatchBoxNet的少量足迹使其成为具有有限计算资源的设备的有吸引力的候选者。该模型是高度可扩展的，因此可以通过适度的其他内存和计算来提高模型精度。最后，我们展示了在存在背景噪声的情况下使用辅助噪声数据集使用辅助噪声数据集的密集数据增强如何提高鲁棒性。

We present an MatchboxNet - an end-to-end neural network for speech command recognition. MatchboxNet is a deep residual network composed from blocks of 1D time-channel separable convolution, batch-normalization, ReLU and dropout layers. MatchboxNet reaches state-of-the-art accuracy on the Google Speech Commands dataset while having significantly fewer parameters than similar models. The small footprint of MatchboxNet makes it an attractive candidate for devices with limited computational resources. The model is highly scalable, so model accuracy can be improved with modest additional memory and compute. Finally, we show how intensive data augmentation using an auxiliary noise dataset improves robustness in the presence of background noise.

下载PDF全文

下载文献需遵守相关版权规定

论文标题