使用卷积神经网络基于框架的重叠语音检测

论文标题

使用卷积神经网络基于框架的重叠语音检测

Frame-based overlapping speech detection using Convolutional Neural Networks

论文作者

Yousefi, Midia, Hansen, John H. L.

论文摘要

自然主义的语音录音通常包含来自多个演讲者的语音信号。由于追踪和识别单个说话者的复杂性，这种现象会降低语音技术的性能。在这项研究中，我们使用卷积神经网络研究了短期25 ms的重叠语音的检测。我们使用不同的光谱功能评估检测性能，并表明Pyknogram特征的表现优于其他常用的语音特征。所提出的系统可以在基于网格数据集生成的混合语音数据集中预测以84 \％精度和88％的精度预测语音。

Naturalistic speech recordings usually contain speech signals from multiple speakers. This phenomenon can degrade the performance of speech technologies due to the complexity of tracing and recognizing individual speakers. In this study, we investigate the detection of overlapping speech on segments as short as 25 ms using Convolutional Neural Networks. We evaluate the detection performance using different spectral features, and show that pyknogram features outperforms other commonly used speech features. The proposed system can predict overlapping speech with an accuracy of 84\% and Fscore of 88\% on a dataset of mixed speech generated based on the GRID dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题