论文标题

用因果均值聚集进行流式reslstm,用于设备定向的话语检测

Streaming ResLSTM with Causal Mean Aggregation for Device-Directed Utterance Detection

论文作者

Tong, Xiaosu, Huang, Che-Wei, Mallidi, Sri Harish, Joseph, Shaun, Pareek, Sonal, Chandak, Chander, Rastrow, Ariya, Maas, Roland

论文摘要

在本文中,我们提出了一个流媒体模型,以区分旨在用于智能家居设备的语音查询和背景语音。所提出的模型由多个具有剩余连接的CNN层组成,然后是堆叠的LSTM架构。通过使用单向LSTM层和因果平均聚集层来实现流式功能,以形成到当前帧的最终话语级别的预测。为了避免在线流推断期间的冗余计算,我们在每个卷积操作中都使用一个缓存机制。与以前的最佳模型相比,对设备定向与非设备指导任务的实验结果表明,提出的模型降低41%。此外,我们表明,与基于注意力的模型相比,所提出的模型能够在时间上准确预测。

In this paper, we propose a streaming model to distinguish voice queries intended for a smart-home device from background speech. The proposed model consists of multiple CNN layers with residual connections, followed by a stacked LSTM architecture. The streaming capability is achieved by using unidirectional LSTM layers and a causal mean aggregation layer to form the final utterance-level prediction up to the current frame. In order to avoid redundant computation during online streaming inference, we use a caching mechanism for every convolution operation. Experimental results on a device-directed vs. non device-directed task show that the proposed model yields an equal error rate reduction of 41% compared to our previous best model on this task. Furthermore, we show that the proposed model is able to accurately predict earlier in time compared to the attention-based models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源