论文标题

瑞士德语词典:言语和写作的变化

A Swiss German Dictionary: Variation in Speech and Writing

论文作者

Schmidt, Larissa, Linder, Lucy, Djambazovska, Sandra, Lazaridis, Alexandros, Samardžić, Tanja, Musat, Claudiu

论文摘要

我们介绍了一个词典,其中包含瑞士各种德语方言的常用单词形式,这些方言归一化为高德语。正如瑞士德语目前是一种主要的口语一样,书面形式也有很大的差异,即使在同一方言的扬声器之间也是如此。为了减轻与这种多样性相关的不确定性,我们与瑞士德国人的一对成对的瑞士德国语音转录(Sampa)相辅相成。因此,该字典成为将大规模自发翻译与语音转录相结合的第一个资源。此外,我们控制区域分布并确保主要瑞士方言的平等表示。语音和书面瑞士德语形式的耦合非常有力。我们表明,它们足以训练基于变压器的音素,以产生可靠的瑞士德国著作的素模型。此外,我们表明,从绘制到音素的反映射可以用训练有新词典的变压器来建模。这一代以前未知词的发音是训练可扩展的自动语音识别(ASR)系统的关键,这是该词典的关键受益者。

We introduce a dictionary containing forms of common words in various Swiss German dialects normalized into High German. As Swiss German is, for now, a predominantly spoken language, there is a significant variation in the written forms, even between speakers of the same dialect. To alleviate the uncertainty associated with this diversity, we complement the pairs of Swiss German - High German words with the Swiss German phonetic transcriptions (SAMPA). This dictionary becomes thus the first resource to combine large-scale spontaneous translation with phonetic transcriptions. Moreover, we control for the regional distribution and insure the equal representation of the major Swiss dialects. The coupling of the phonetic and written Swiss German forms is powerful. We show that they are sufficient to train a Transformer-based phoneme to grapheme model that generates credible novel Swiss German writings. In addition, we show that the inverse mapping - from graphemes to phonemes - can be modeled with a transformer trained with the novel dictionary. This generation of pronunciations for previously unknown words is key in training extensible automated speech recognition (ASR) systems, which are key beneficiaries of this dictionary.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源