用于多人歧义的polyphone bert中文中文

论文标题

用于多人歧义的polyphone bert中文中文

A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese

论文作者

Zhang, Song, Zheng, Ken, Zhu, Xiaoxu, Li, Baoxiang

论文摘要

字素至phoneme（G2P）转换是中国普通话文本到语音（TTS）系统的必不可少的一部分，G2P转换的核心是解决Polyphone Disampiguation的问题，这是为了为中国多声型特征提供正确的几个候选者的正确发音。在本文中，我们提出了一种中国多人BERT模型，以预测中国多形特征的发音。首先，我们通过发音创建了来自354个源中国复音字符的741个新的中国单声道字符。然后，我们通过将预先训练的中国伯特（Bert）带有741个新的中国单声道字符，并为新令牌添加相应的嵌入层，从而获得了中国多人手机BERT，该bert添加了新令牌的相应嵌入层，该层是由中国多个字符的嵌入来初始化的。这样，我们可以将多人歧义任务转变为中国多人手机BERT的预训练任务。实验结果证明了所提出的模型的有效性，与基于BERT的分类器模型相比，Polyphone BERT模型获得了平均准确性的2％（从92.1％到94.1％），这是多人手机删除歧义的先前先验。

Grapheme-to-phoneme (G2P) conversion is an indispensable part of the Chinese Mandarin text-to-speech (TTS) system, and the core of G2P conversion is to solve the problem of polyphone disambiguation, which is to pick up the correct pronunciation for several candidates for a Chinese polyphonic character. In this paper, we propose a Chinese polyphone BERT model to predict the pronunciations of Chinese polyphonic characters. Firstly, we create 741 new Chinese monophonic characters from 354 source Chinese polyphonic characters by pronunciation. Then we get a Chinese polyphone BERT by extending a pre-trained Chinese BERT with 741 new Chinese monophonic characters and adding a corresponding embedding layer for new tokens, which is initialized by the embeddings of source Chinese polyphonic characters. In this way, we can turn the polyphone disambiguation task into a pre-training task of the Chinese polyphone BERT. Experimental results demonstrate the effectiveness of the proposed model, and the polyphone BERT model obtain 2% (from 92.1% to 94.1%) improvement of average accuracy compared with the BERT-based classifier model, which is the prior state-of-the-art in polyphone disambiguation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题