论文标题

扬声器诊断和词汇信息

Speaker Diarization with Lexical Information

论文作者

Park, Tae Jin, Han, Kyu J., Huang, Jing, He, Xiaodong, Zhou, Bowen, Georgiou, Panayiotis, Narayanan, Shrikanth

论文摘要

这项工作为说话者诊断提供了一种新颖的方法,以利用自动语音识别提供的词汇信息。我们提出了一个扬声器诊断系统,该系统可以将扬声器的转弯概率与扬声器嵌入到扬声器聚类过程中,以提高整体诊断精度。为了在聚类过程中以全面的方式整合词汇和声学信息,我们引入了用于光谱聚类的邻接矩阵积分。由于语音识别系统提供了单词级说话者转向概率估计的单词和单词边界信息,因此我们提出的方法在没有任何人为干预的手动抄录的情况下起作用。我们表明,与基线诊断系统相比,仅使用扬声器嵌入中的声学信息,该方法可提高各种评估数据集的诊断性能。

This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate lexical and acoustic information in a comprehensive way during clustering, we introduce an adjacency matrix integration for spectral clustering. Since words and word boundary information for word-level speaker turn probability estimation are provided by a speech recognition system, our proposed method works without any human intervention for manual transcriptions. We show that the proposed method improves diarization performance on various evaluation datasets compared to the baseline diarization system using acoustic information only in speaker embeddings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源