QTI提交Dcase 2021：具有高效设计的设备不平衡的声学场景分类的剩余标准化

论文标题

QTI提交Dcase 2021：具有高效设计的设备不平衡的声学场景分类的剩余标准化

QTI Submission to DCASE 2021: residual normalization for device-imbalanced acoustic scene classification with efficient design

论文作者

Kim, Byeonggeun, Yang, Seunghan, Kim, Jangho, Chang, Simyung

论文摘要

该技术报告描述了我们任务1A提交Dcase2021挑战的详细信息。该任务的目的是在模型复杂性的限制下设计一个音频场景分类系统，以针对设备不平衡的数据集设计一个音频场景分类系统。该报告介绍了四种实现目标的方法。首先，我们提出了剩余的归一化，这是一种新型特征归一化方法，该方法将实例归一化与快捷方式路径使用实例归一化，以丢弃不必要的特定设备特定信息，而不会丢失有用的信息进行分类。其次，我们设计了一个高效的体系结构，BC-Resnet-Mod，这是基线体系结构的修改版本，具有有限的接收场。第三，我们利用光谱图到光谱图从一个设备转换为多个设备来增强训练数据。最后，我们利用三种模型压缩方案：修剪，量化和知识蒸馏来降低模型的复杂性。所提出的系统在Tau Urban声学场景2020移动数据集，具有315K参数的开发数据集中达到平均测试精度为76.3％，在压缩到61.0kb的非零参数后，具有315K参数的开发数据集，平均测试准确度为75.3％。我们将这项工作扩展到[1]。

This technical report describes the details of our TASK1A submission of the DCASE2021 challenge. The goal of the task is to design an audio scene classification system for device-imbalanced datasets under the constraints of model complexity. This report introduces four methods to achieve the goal. First, we propose Residual Normalization, a novel feature normalization method that uses instance normalization with a shortcut path to discard unnecessary device-specific information without losing useful information for classification. Second, we design an efficient architecture, BC-ResNet-Mod, a modified version of the baseline architecture with a limited receptive field. Third, we exploit spectrogram-to-spectrogram translation from one to multiple devices to augment training data. Finally, we utilize three model compression schemes: pruning, quantization, and knowledge distillation to reduce model complexity. The proposed system achieves an average test accuracy of 76.3% in TAU Urban Acoustic Scenes 2020 Mobile, development dataset with 315k parameters, and average test accuracy of 75.3% after compression to 61.0KB of non-zero parameters. We extend this work to [1].

下载PDF全文

下载文献需遵守相关版权规定

论文标题