论文标题

带有上下文提示的口语对话框系统的设备定向

Device Directedness with Contextual Cues for Spoken Dialog Systems

论文作者

Bekal, Dhanush, Srinivasan, Sundararajan, Bodapati, Sravan, Ronanki, Srikanth, Kirchhoff, Katrin

论文摘要

在这项工作中,我们将驳船验证定义为一项有监督的学习任务,其中只有音频信息将用户口语对话分类为真和错误的驳船。遵循预先训练的模型的成功,我们使用自我监督表示模型的低级语音表示,用于我们的下游分类任务。此外,我们提出了一种新颖的技术,将词汇信息直接注入语音表示中,以改善在预训练期间隐含地学习的领域特定语言信息。在口语对话框数据上进行的实验表明,我们所提出的模型完全从语音表示验证驳船中的相对相对38%,并且在使用音频和自动语音识别(ASR)1-最佳假设的基线LSTM模型中,相对F1分数提高了4.5%。最重要的是,我们提出的具有词汇注入表示形式以及上下文特征的最佳模型在F1分数中的相对相对提高5.7%,但比基线快22%。

In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins. Following the success of pre-trained models, we use low-level speech representations from a self-supervised representation learning model for our downstream classification task. Further, we propose a novel technique to infuse lexical information directly into speech representations to improve the domain-specific language information implicitly learned during pre-training. Experiments conducted on spoken dialog data show that our proposed model trained to validate barge-in entirely from speech representations is faster by 38% relative and achieves 4.5% relative F1 score improvement over a baseline LSTM model that uses both audio and Automatic Speech Recognition (ASR) 1-best hypotheses. On top of this, our best proposed model with lexically infused representations along with contextual features provides a further relative improvement of 5.7% in the F1 score but only 22% faster than the baseline.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源