在建立低资源语言的口语理解系统上

论文标题

在建立低资源语言的口语理解系统上

On Building Spoken Language Understanding Systems for Low Resourced Languages

论文作者

Gupta, Akshat

论文摘要

由于其在文本界面上具有各种优势，因此口语对话系统正在逐渐成为人类体验的组成部分。口语理解（SLU）系统是口语对话系统的基本构建基块。但是，为低资源语言创建SLU系统仍然是一个挑战。在大量低资源的语言中，我们无法访问足够的数据来构建自动语音识别（ASR）技术，这对于任何SLU系统至关重要。另外，基于ASR的SLU系统不会推广到不成文的语言。在本文中，我们提出了一系列实验，以探索极低的资源设置，在这些设置中，我们对训练的系统进行了意图分类，而训练的系统则为每个意图的一个数据点，并且数据集中只有一位扬声器。我们还在低资源的设置中工作，在该设置中，我们不使用特定语言的ASR系统来转录输入语音，这使建立SLU系统的挑战加剧了模拟真正的低资源设置的挑战。我们在比利时荷兰语（佛兰德）和英语上测试我们的系统，发现在如此低资源的设置中使用语音转录使意图分类系统的性能要比使用语音功能要好得多。具体而言，当在基于功能的系统上使用基于语音转录的系统时，当二进制和四级分类问题分别为12.37％和13.08％的平均改善，当时平均49个不同的实验设置。

Spoken dialog systems are slowly becoming and integral part of the human experience due to their various advantages over textual interfaces. Spoken language understanding (SLU) systems are fundamental building blocks of spoken dialog systems. But creating SLU systems for low resourced languages is still a challenge. In a large number of low resourced language, we don't have access to enough data to build automatic speech recognition (ASR) technologies, which are fundamental to any SLU system. Also, ASR based SLU systems do not generalize to unwritten languages. In this paper, we present a series of experiments to explore extremely low-resourced settings where we perform intent classification with systems trained on as low as one data-point per intent and with only one speaker in the dataset. We also work in a low-resourced setting where we do not use language specific ASR systems to transcribe input speech, which compounds the challenge of building SLU systems to simulate a true low-resourced setting. We test our system on Belgian Dutch (Flemish) and English and find that using phonetic transcriptions to make intent classification systems in such low-resourced setting performs significantly better than using speech features. Specifically, when using a phonetic transcription based system over a feature based system, we see average improvements of 12.37% and 13.08% for binary and four-class classification problems respectively, when averaged over 49 different experimental settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题