通过验证的语言模型来了解口语理解的数据增强

论文标题

通过验证的语言模型来了解口语理解的数据增强

Data Augmentation for Spoken Language Understanding via Pretrained Language Models

论文作者

Peng, Baolin, Zhu, Chenguang, Zeng, Michael, Gao, Jianfeng

论文摘要

口语理解（SLU）模型的培训通常面临数据稀缺问题。在本文中，我们使用验证的语言模型提出了一种数据增强方法，以提高产生的话语的可变性和准确性。此外，我们调查并提出解决方案，以解决两个以前被忽视的SLU中数据稀缺的半监督学习方案：i）富裕学：具有许多有效的对话行为的本体论信息； ii）丰富的物质：有大量未标记的话语可用。经验结果表明，我们的方法可以产生合成训练数据，从而在各种情况下提高语言理解模型的性能。

The training of spoken language understanding (SLU) models often faces the problem of data scarcity. In this paper, we put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances. Furthermore, we investigate and propose solutions to two previously overlooked semi-supervised learning scenarios of data scarcity in SLU: i) Rich-in-Ontology: ontology information with numerous valid dialogue acts is given; ii) Rich-in-Utterance: a large number of unlabelled utterances are available. Empirical results show that our method can produce synthetic training data that boosts the performance of language understanding models in various scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题