论文标题
GraphSpeech:神经语音综合的语法感知图表网络
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis
论文作者
论文摘要
在许多方面,基于注意力的端到端文本到语音综合(TTS)优于常规统计方法。基于变压器的TTS就是这样成功的实现之一。尽管变形金刚TTS与自我发挥的机制很好地对语音框架进行了很好的建模,但它并未将输入文本与句子级别的句法观点相关联。我们提出了一种新型的神经TTS模型,称为GraphSpeech,该模型在图神经网络框架下制定。 GraphSpeech在句子中明确编码输入词汇令牌的句法关系,并结合了此类信息,以得出TTS注意机制的句法动机字符嵌入。实验表明,GraphSpeech在频谱和韵律渲染方面始终优于变压器TTS基线。
Attention-based end-to-end text-to-speech synthesis (TTS) is superior to conventional statistical methods in many ways. Transformer-based TTS is one of such successful implementations. While Transformer TTS models the speech frame sequence well with a self-attention mechanism, it does not associate input text with output utterances from a syntactic point of view at sentence level. We propose a novel neural TTS model, denoted as GraphSpeech, that is formulated under graph neural network framework. GraphSpeech encodes explicitly the syntactic relation of input lexical tokens in a sentence, and incorporates such information to derive syntactically motivated character embeddings for TTS attention mechanism. Experiments show that GraphSpeech consistently outperforms the Transformer TTS baseline in terms of spectrum and prosody rendering of utterances.