改善与Bilstms的交互质量估计以及对对话政策学习的影响

论文标题

改善与Bilstms的交互质量估计以及对对话政策学习的影响

Improving Interaction Quality Estimation with BiLSTMs and the Impact on Dialogue Policy Learning

论文作者

Ultes, Stefan

论文摘要

在统计口语对话系统中，学习合适且表现出色的对话行为已成为研究的重点多年。尽管大多数基于强化学习的工作都采用了一个客观的措施来建模奖励信号，但我们使用基于用户满意度估计的奖励。我们提出了一个新颖的估计器，并表明它在隐式学习时间依赖性的同时胜过所有以前的估计器。此外，我们应用了这种新颖的用户满意度估计模型，该模型实时在模拟实验中，其中满意度估计模型在一个域上训练并应用于许多其他涵盖类似任务的域中。我们表明，应用此模型会导致更高的估计满意度，相似的任务成功率和更高的噪声鲁棒性。

Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work which is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we use a reward based on user satisfaction estimation. We propose a novel estimator and show that it outperforms all previous estimators while learning temporal dependencies implicitly. Furthermore, we apply this novel user satisfaction estimation model live in simulated experiments where the satisfaction estimation model is trained on one domain and applied in many other domains which cover a similar task. We show that applying this model results in higher estimated satisfaction, similar task success rates and a higher robustness to noise.

下载PDF全文

下载文献需遵守相关版权规定

论文标题