使用X-向量的扬声器条件的声学反向反演

论文标题

使用X-向量的扬声器条件的声学反向反演

Speaker conditioned acoustic-to-articulatory inversion using x-vectors

论文作者

Illa, Aravind, Ghosh, Prasanta Kumar

论文摘要

语音生产涉及各种铰接器的运动，包括舌头，下巴和嘴唇。从语音的声学中估计枢纽的移动被称为声学反向（AAI）。最近，已经表明，代替以说话者为特定的方式训练AAI，而是汇集来自多个扬声器的声学数据是有益的。此外，通过在AAI的输入下单热编码以及声学功能，其他条件通过扬声器的特定信息以及在封闭式扬声器列车和测试条件中受益于AAI性能。在这项工作中，我们进行了一项实验研究，该研究对使用X量向量提供特定信息以调理AAI的好处。使用30位扬声器的实验表明，在封闭的可见扬声器条件下使用X量的AAI性能受益。此外，X-向量还可以很好地概括为看不见的说话者评估。

Speech production involves the movement of various articulators, including tongue, jaw, and lips. Estimating the movement of the articulators from the acoustics of speech is known as acoustic-to-articulatory inversion (AAI). Recently, it has been shown that instead of training AAI in a speaker specific manner, pooling the acoustic-articulatory data from multiple speakers is beneficial. Further, additional conditioning with speaker specific information by one-hot encoding at the input of AAI along with acoustic features benefits the AAI performance in a closed-set speaker train and test condition. In this work, we carry out an experimental study on the benefit of using x-vectors for providing speaker specific information to condition AAI. Experiments with 30 speakers have shown that the AAI performance benefits from the use of x-vectors in a closed set seen speaker condition. Further, x-vectors also generalizes well for unseen speaker evaluation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题