论文标题
SUMBT+LARL:有效的多域端到端神经任务对话框系统
SUMBT+LaRL: Effective Multi-domain End-to-end Neural Task-oriented Dialog System
论文作者
论文摘要
在以任务为导向的对话系统中开发每个对话框组件的神经方法的最新出现已经大大改善,但是优化整体系统性能仍然是一个挑战。此外,以前关于以端到端方式建模复杂多域目标对话的研究是有限的。在本文中,我们提出了一个有效的多域端到端可训练的神经对话系统Sumbt+larl,该系统结合了两个以前的强型模型,并促进它们完全可区分。具体而言,SUMBT+估计用户行为以及对话信念状态,LARL模型潜在的系统作用空间并在给定估计上下文的情况下生成响应。我们强调,三个步骤的培训框架显着稳定地提高了对话率的成功率:单独预处理Sumbt+和Larl,对整个系统进行微调,然后对对话策略进行加强学习。我们还引入了对话政策培训的加强学习的新奖励标准。然后,我们根据奖励标准和不同的对话评估方法讨论实验结果。因此,我们的模型在基于语料库的评估方面达到了85.4%的新最新成功率,而DSTC8 Challenge提供的基于模拟器的评估的可比成功率为81.40%。据我们所知,我们的工作是对模块化的E2E多域对话系统的首次全面研究,该系统从每个组件学习到整个对话策略以获得任务成功。
The recent advent of neural approaches for developing each dialog component in task-oriented dialog systems has remarkably improved, yet optimizing the overall system performance remains a challenge. Besides, previous research on modeling complicated multi-domain goal-oriented dialogs in end-to-end fashion has been limited. In this paper, we present an effective multi-domain end-to-end trainable neural dialog system SUMBT+LaRL that incorporates two previous strong models and facilitates them to be fully differentiable. Specifically, the SUMBT+ estimates user-acts as well as dialog belief states, and the LaRL models latent system action spaces and generates responses given the estimated contexts. We emphasize that the training framework of three steps significantly and stably increase dialog success rates: separately pretraining the SUMBT+ and LaRL, fine-tuning the entire system, and then reinforcement learning of dialog policy. We also introduce new reward criteria of reinforcement learning for dialog policy training. Then, we discuss experimental results depending on the reward criteria and different dialog evaluation methods. Consequently, our model achieved the new state-of-the-art success rate of 85.4% on corpus-based evaluation, and a comparable success rate of 81.40% on simulator-based evaluation provided by the DSTC8 challenge. To our best knowledge, our work is the first comprehensive study of a modularized E2E multi-domain dialog system that learning from each component to the entire dialog policy for task success.