Dihard II仍然很难：DKU-LENOVO团队的实验结果和讨论

论文标题

Dihard II仍然很难：DKU-LENOVO团队的实验结果和讨论

DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team

论文作者

Lin, Qingjian, Cai, Weicheng, Yang, Lin, Wang, Junjie, Zhang, Jun, Li, Ming

论文摘要

在本文中，我们介绍了Dkulenovo团队的第二次Dihard语音诊断挑战的系统。我们的诊断系统包括多个模块，即语音活动检测（VAD），分割，说话者嵌入提取，相似性评分，聚类，重新召集和重叠检测。对于每个模块，我们探索不同的技术以提高性能。我们的最终提交使用基于RESNET-LSTM的VAD，基于RESNET的扬声器嵌入，基于LSTM的相似性评分和频谱聚类。变异贝叶斯（VB）诊断在静电阶段应用，重叠检测也带来了略有改进。我们提出的系统在Track1中达到18.84％的DER，而Track2的DER为27.90％。尽管我们的系统将DERS降低了27.5％和31.7％，但我们认为诊断任务仍然非常困难。

In this paper, we present the submitted system for the second DIHARD Speech Diarization Challenge from the DKULENOVO team. Our diarization system includes multiple modules, namely voice activity detection (VAD), segmentation, speaker embedding extraction, similarity scoring, clustering, resegmentation and overlap detection. For each module, we explore different techniques to enhance performance. Our final submission employs the ResNet-LSTM based VAD, the Deep ResNet based speaker embedding, the LSTM based similarity scoring and spectral clustering. Variational Bayes (VB) diarization is applied in the resegmentation stage and overlap detection also brings slight improvement. Our proposed system achieves 18.84% DER in Track1 and 27.90% DER in Track2. Although our systems have reduced the DERs by 27.5% and 31.7% relatively against the official baselines, we believe that the diarization task is still very difficult.

下载PDF全文

下载文献需遵守相关版权规定

论文标题