IMASC- iCfoss Malayalam演讲语料库

论文标题

IMASC- iCfoss Malayalam演讲语料库

IMaSC -- ICFOSS Malayalam Speech Corpus

论文作者

Gopinath, Deepa P, K, Thennal D, Nair, Vrinda V, S, Swaraj K, G, Sachin

论文摘要

现代文本到语音（TTS）系统使用深度学习来综合语音越来越接近人类质量，但是它们需要一个高质量音频文本句子对的数据库进行培训。马拉雅拉姆语是印度喀拉拉邦的官方语言，由35多百万人使用，在TTS系统的可用语料库中是一种低资源语言。在本文中，我们介绍了Imasc，这是一个马拉雅拉姆语文本和语音语料库，其中包含大约50个小时的录音语音。 IMASC拥有8位演讲者和总共34,473个文本原告对，比其他所有公开可用的替代方案都要大。我们通过使用现代深度学习体系结构为每个演讲者培训TTS模型来评估数据库。通过主观评估，我们表明，与以前的研究和公开可用的模型相比，我们的模型在自然性方面的表现明显好得多，平均意见分数为4.50，表明合成的语音接近人类质量。

Modern text-to-speech (TTS) systems use deep learning to synthesize speech increasingly approaching human quality, but they require a database of high quality audio-text sentence pairs for training. Malayalam, the official language of the Indian state of Kerala and spoken by 35+ million people, is a low resource language in terms of available corpora for TTS systems. In this paper, we present IMaSC, a Malayalam text and speech corpora containing approximately 50 hours of recorded speech. With 8 speakers and a total of 34,473 text-audio pairs, IMaSC is larger than every other publicly available alternative. We evaluated the database by using it to train TTS models for each speaker based on a modern deep learning architecture. Via subjective evaluation, we show that our models perform significantly better in terms of naturalness compared to previous studies and publicly available models, with an average mean opinion score of 4.50, indicating that the synthesized speech is close to human quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题