语音处理的拓扑数据分析

论文标题

语音处理的拓扑数据分析

Topological Data Analysis for Speech Processing

论文作者

Tulchinskii, Eduard, Kuznetsov, Kristian, Kushnareva, Laida, Cherniavskii, Daniil, Barannikov, Serguei, Piontkovskaya, Irina, Nikolenko, Sergey, Burnaev, Evgeny

论文摘要

我们将拓扑数据分析（TDA）应用于语音分类问题，并在预验证的语音模型Hubert的内省。为此，我们介绍了来自变压器注意图和嵌入的许多拓扑和代数特征。我们表明，在此类功能之上构建的简单线性分类器优于微调分类头。特别是，我们在四个普通数据集上获得了约9美元的准确性和5美元的$ 5 \％$ err；在Crema-D上，拟议的功能集以80.155美元的准确性达到了新的最新表现状态。我们还表明，拓扑特征能够揭示语音变压器头的功能作用。例如，我们发现能够区分样品源对（天然/合成）或声音对而没有任何下游微调。我们的结果表明，TDA是一种有希望的语音分析方法，尤其是对于需要结构预测的任务。附录，TDA简介和其他其他材料可在此处提供-https：//topohubert.github.io/speech-topology-webpages/

We apply topological data analysis (TDA) to speech classification problems and to the introspection of a pretrained speech model, HuBERT. To this end, we introduce a number of topological and algebraic features derived from Transformer attention maps and embeddings. We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head. In particular, we achieve an improvement of about $9\%$ accuracy and $5\%$ ERR on four common datasets; on CREMA-D, the proposed feature set reaches a new state of the art performance with accuracy $80.155$. We also show that topological features are able to reveal functional roles of speech Transformer heads; e.g., we find the heads capable to distinguish between pairs of sample sources (natural/synthetic) or voices without any downstream fine-tuning. Our results demonstrate that TDA is a promising new approach for speech analysis, especially for tasks that require structural prediction. Appendices, an introduction to TDA, and other additional materials are available here - https://topohubert.github.io/speech-topology-webpages/

下载PDF全文

下载文献需遵守相关版权规定

论文标题