论文标题

语音处理的拓扑数据分析

Topological Data Analysis for Speech Processing

论文作者

Tulchinskii, Eduard, Kuznetsov, Kristian, Kushnareva, Laida, Cherniavskii, Daniil, Barannikov, Serguei, Piontkovskaya, Irina, Nikolenko, Sergey, Burnaev, Evgeny

论文摘要

我们将拓扑数据分析(TDA)应用于语音分类问题,并在预验证的语音模型Hubert的内省。为此,我们介绍了来自变压器注意图和嵌入的许多拓扑和代数特征。我们表明,在此类功能之上构建的简单线性分类器优于微调分类头。特别是,我们在四个普通数据集上获得了约9美元的准确性和5美元的$ 5 \%$ err;在Crema-D上,拟议的功能集以80.155美元的准确性达到了新的最新表现状态。我们还表明,拓扑特征能够揭示语音变压器头的功能作用。例如,我们发现能够区分样品源对(天然/合成)或声音对而没有任何下游微调。我们的结果表明,TDA是一种有希望的语音分析方法,尤其是对于需要结构预测的任务。附录,TDA简介和其他其他材料可在此处提供-https://topohubert.github.io/speech-topology-webpages/

We apply topological data analysis (TDA) to speech classification problems and to the introspection of a pretrained speech model, HuBERT. To this end, we introduce a number of topological and algebraic features derived from Transformer attention maps and embeddings. We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head. In particular, we achieve an improvement of about $9\%$ accuracy and $5\%$ ERR on four common datasets; on CREMA-D, the proposed feature set reaches a new state of the art performance with accuracy $80.155$. We also show that topological features are able to reveal functional roles of speech Transformer heads; e.g., we find the heads capable to distinguish between pairs of sample sources (natural/synthetic) or voices without any downstream fine-tuning. Our results demonstrate that TDA is a promising new approach for speech analysis, especially for tasks that require structural prediction. Appendices, an introduction to TDA, and other additional materials are available here - https://topohubert.github.io/speech-topology-webpages/

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源