大规模混合方法可预测用户对会话代理的满意度

论文标题

大规模混合方法可预测用户对会话代理的满意度

Large-scale Hybrid Approach for Predicting User Satisfaction with Conversational Agents

论文作者

Park, Dookun, Yuan, Hao, Kim, Dongmin, Zhang, Yinglei, Spyros, Matsoukas, Kim, Young-Bum, Sarikaya, Ruhi, Guo, Edward, Ling, Yuan, Quinn, Kevin, Hung, Pham, Yao, Benjamin, Lee, Sungjin

论文摘要

衡量用户满意度水平是一项具有挑战性的任务，也是开发满足真实用户需求的大规模对话代理系统的关键组成部分。解决此问题的一种广泛使用的方法是收集人类注释数据并将其用于评估或建模。基于人体注释的方法更容易控制，但很难扩展。一种新颖的替代方法是通过嵌入到对话代理系统的反馈启发系统来收集用户的直接反馈，并使用收集的用户反馈来训练机器学习的模型进行概括。用户反馈是用户满意度的最佳代理，但不适合某些不合格的意图和某些情况。因此，这两种方法是彼此互补的。在这项工作中，我们通过一种混合方法来解决用户满意度评估问题，该方法融合了明确的用户反馈，两个机器学习模型推断出的用户满意度预测，一个对用户反馈数据进行了培训和其他人类注释数据。混合方法基于瀑布政策，亚马逊Alexa的大规模数据集的实验结果在推断用户满意度方面显示出显着改善。本文介绍了详细的混合体系结构，对用户反馈数据的深入分析以及生成数据集以正确模拟实时流量的算法。

Measuring user satisfaction level is a challenging task, and a critical component in developing large-scale conversational agent systems serving the needs of real users. An widely used approach to tackle this is to collect human annotation data and use them for evaluation or modeling. Human annotation based approaches are easier to control, but hard to scale. A novel alternative approach is to collect user's direct feedback via a feedback elicitation system embedded to the conversational agent system, and use the collected user feedback to train a machine-learned model for generalization. User feedback is the best proxy for user satisfaction, but is not available for some ineligible intents and certain situations. Thus, these two types of approaches are complementary to each other. In this work, we tackle the user satisfaction assessment problem with a hybrid approach that fuses explicit user feedback, user satisfaction predictions inferred by two machine-learned models, one trained on user feedback data and the other human annotation data. The hybrid approach is based on a waterfall policy, and the experimental results with Amazon Alexa's large-scale datasets show significant improvements in inferring user satisfaction. A detailed hybrid architecture, an in-depth analysis on user feedback data, and an algorithm that generates data sets to properly simulate the live traffic are presented in this paper.

下载PDF全文

下载文献需遵守相关版权规定

论文标题