带有上下文提示的口语对话框系统的设备定向

论文标题

带有上下文提示的口语对话框系统的设备定向

Device Directedness with Contextual Cues for Spoken Dialog Systems

论文作者

Bekal, Dhanush, Srinivasan, Sundararajan, Bodapati, Sravan, Ronanki, Srikanth, Kirchhoff, Katrin

论文摘要

在这项工作中，我们将驳船验证定义为一项有监督的学习任务，其中只有音频信息将用户口语对话分类为真和错误的驳船。遵循预先训练的模型的成功，我们使用自我监督表示模型的低级语音表示，用于我们的下游分类任务。此外，我们提出了一种新颖的技术，将词汇信息直接注入语音表示中，以改善在预训练期间隐含地学习的领域特定语言信息。在口语对话框数据上进行的实验表明，我们所提出的模型完全从语音表示验证驳船中的相对相对38％，并且在使用音频和自动语音识别（ASR）1-最佳假设的基线LSTM模型中，相对F1分数提高了4.5％。最重要的是，我们提出的具有词汇注入表示形式以及上下文特征的最佳模型在F1分数中的相对相对提高5.7％，但比基线快22％。

In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins. Following the success of pre-trained models, we use low-level speech representations from a self-supervised representation learning model for our downstream classification task. Further, we propose a novel technique to infuse lexical information directly into speech representations to improve the domain-specific language information implicitly learned during pre-training. Experiments conducted on spoken dialog data show that our proposed model trained to validate barge-in entirely from speech representations is faster by 38% relative and achieves 4.5% relative F1 score improvement over a baseline LSTM model that uses both audio and Automatic Speech Recognition (ASR) 1-best hypotheses. On top of this, our best proposed model with lexically infused representations along with contextual features provides a further relative improvement of 5.7% in the F1 score but only 22% faster than the baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题