说话者诊断的概率嵌入

论文标题

说话者诊断的概率嵌入

Probabilistic embeddings for speaker diarization

论文作者

Silnova, Anna, Brümmer, Niko, Rohdin, Johan, Stafylakis, Themos, Burget, Lukáš

论文摘要

从非常短的语音段中提取的说话者嵌入（X-向量）最近已显示出在说话者诊断中具有竞争性能。我们通过与X-vector并行从每个语音段中提取对角精度矩阵，从而概括了此食谱，从而为有关语音段质量的信息提供了一条路径，以将语音段的质量传播到PLDA评分后端。这些精确度量化了有关嵌入值的不确定性，如果它们是从高质量的语音段中提取的。提出的概率嵌入（具有精确度的X向量）通过将X向量视为隐藏变量并将其边缘化来与PLDA模型连接。我们将所提出的概率嵌入作为集聚层次聚类（AHC）算法的输入，以在Dihard'19评估集中进行诊断。我们为AHC考虑的每个聚类假设计算了“书籍”的完整PLDA可能性。我们对PLDA参数和概率X载体提取器进行联合判别训练。我们展示了相对于基线AHC算法的准确性，该算法应用于传统的XVECTOR（没有不确定性），并且使用二进制log-ligikelihood-ratios的平均值，而不是书本评分。

Speaker embeddings (x-vectors) extracted from very short segments of speech have recently been shown to give competitive performance in speaker diarization. We generalize this recipe by extracting from each speech segment, in parallel with the x-vector, also a diagonal precision matrix, thus providing a path for the propagation of information about the quality of the speech segment into a PLDA scoring backend. These precisions quantify the uncertainty about what the values of the embeddings might have been if they had been extracted from high quality speech segments. The proposed probabilistic embeddings (x-vectors with precisions) are interfaced with the PLDA model by treating the x-vectors as hidden variables and marginalizing them out. We apply the proposed probabilistic embeddings as input to an agglomerative hierarchical clustering (AHC) algorithm to do diarization in the DIHARD'19 evaluation set. We compute the full PLDA likelihood 'by the book' for each clustering hypothesis that is considered by AHC. We do joint discriminative training of the PLDA parameters and of the probabilistic x-vector extractor. We demonstrate accuracy gains relative to a baseline AHC algorithm, applied to traditional xvectors (without uncertainty), and which uses averaging of binary log-likelihood-ratios, rather than by-the-book scoring.

下载PDF全文

下载文献需遵守相关版权规定

论文标题