改善多语言扬声器的语言识别

论文标题

改善多语言扬声器的语言识别

Improving Language Identification for Multilingual Speakers

论文作者

Titus, Andrew, Silovsky, Jan, Chen, Nanxin, Hsiao, Roger, Young, Mary, Ghoshal, Arnab

论文摘要

近年来，口语识别（LID）技术从歧视很大程度上不同的语言到歧视高度相似语言甚至是同一语言的方言。然而，尽管是许多利用盖技术的系统的主要目标受众，但主要忽略的一个方面是对多语言扬声器的语言进行歧视。正如我们在这项工作中所显示的那样，大多数语言组合的盖子系统可以具有很高的平均精度，而在出现重音语音时，对他人的表现极低。我们通过在声学盖模型中使用粗粒粒度目标来解决这一问题，并将其输出与上下文感知模型中的交互作用信号集成在一起，以将系统定制为每个用户。该组合系统在所有语言组合中平均达到97％的精度，同时相对于我们的基线提高了最差的准确性超过60％。

Spoken language identification (LID) technologies have improved in recent years from discriminating largely distinct languages to discriminating highly similar languages or even dialects of the same language. One aspect that has been mostly neglected, however, is discrimination of languages for multilingual speakers, despite being a primary target audience of many systems that utilize LID technologies. As we show in this work, LID systems can have a high average accuracy for most combinations of languages while greatly underperforming for others when accented speech is present. We address this by using coarser-grained targets for the acoustic LID model and integrating its outputs with interaction context signals in a context-aware model to tailor the system to each user. This combined system achieves an average 97% accuracy across all language combinations while improving worst-case accuracy by over 60% relative to our baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题