在有声读物数据集上使用跨言式的阅读样式转移

论文标题

在有声读物数据集上使用跨言式的阅读样式转移

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

论文作者

Li, Xiang, Song, Changhe, Wei, Xianhao, Wu, Zhiyong, Jia, Jia, Meng, Helen

论文摘要

跨言扬声器风格的转移旨在提取给定参考语音的语音样式，可以在任意目标扬声器的音色中复制。有关此主题的现有方法已经探索了利用语音级样式标签通过全球或本地规模样式表示进行样式转移。但是，有声读物数据集通常以当地韵律和全球流派的形式进行特征，并且很少伴有说话级风格的标签。因此，正确地将阅读方式转移到不同的扬声器上仍然是一项艰巨的任务。本文旨在介绍块的多尺度跨言式风格模型，以捕获有声读物的全球类型和本地韵律。此外，通过使用拟议的可切换对手分类器来解开扬声器的音色和样式，提取的阅读样式可适应不同扬声器的音色。实验结果证实，该模型设法将给定的阅读方式转移到新的目标扬声器上。在局部韵律和全球流派类型预测因子的支持下，进一步揭示了所提出的方法在多扬声器有声读物中的潜力。

Cross-speaker style transfer aims to extract the speech style of the given reference speech, which can be reproduced in the timbre of arbitrary target speakers. Existing methods on this topic have explored utilizing utterance-level style labels to perform style transfer via either global or local scale style representations. However, audiobook datasets are typically characterized by both the local prosody and global genre, and are rarely accompanied by utterance-level style labels. Thus, properly transferring the reading style across different speakers remains a challenging task. This paper aims to introduce a chunk-wise multi-scale cross-speaker style model to capture both the global genre and the local prosody in audiobook speeches. Moreover, by disentangling speaker timbre and style with the proposed switchable adversarial classifiers, the extracted reading style is made adaptable to the timbre of different speakers. Experiment results confirm that the model manages to transfer a given reading style to new target speakers. With the support of local prosody and global genre type predictor, the potentiality of the proposed method in multi-speaker audiobook generation is further revealed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题