开放域聊天机器人的对话如何“打开”？基于语音事件的评估提案

论文标题

开放域聊天机器人的对话如何“打开”？基于语音事件的评估提案

How "open" are the conversations with open-domain chatbots? A proposal for Speech Event based evaluation

论文作者

Doğruöz, A. Seza, Skantze, Gabriel

论文摘要

开放域的聊天机器人应该与人类自由交谈，而不仅限于主题，任务或领域。但是，开放域对话的边界和/或内容尚不清楚。为了阐明“开放性”的界限，我们进行了两项研究：首先，我们对聊天机器人评估数据集（即Google的Meena）中遇到的“语音事件”类型进行了分类，并发现这些对话主要涵盖“ Small Talk”类别，并排除了现实生活中人类交流中遇到的其他语音事件类别。其次，我们进行了一项小规模的试点研究，以进行在线对话，其中涵盖了两个人与一个人和最先进的聊天机器人之间的更广泛的语音事件类别（即Facebook Blender）。人类对这些产生的对话的评估表明对人类对话的偏爱，因为在大多数语音事件类别中，人类聊天对话缺乏连贯性。基于这些结果，我们建议（a）对于当前的聊天机器人而不是“开放域”一词，而不是“开放式”聊天机器人，这些聊天机器人尚未在对话能力方面“开放”，以及（b）修改评估方法以测试聊天机器人对其他语音事件的对话。

Open-domain chatbots are supposed to converse freely with humans without being restricted to a topic, task or domain. However, the boundaries and/or contents of open-domain conversations are not clear. To clarify the boundaries of "openness", we conduct two studies: First, we classify the types of "speech events" encountered in a chatbot evaluation data set (i.e., Meena by Google) and find that these conversations mainly cover the "small talk" category and exclude the other speech event categories encountered in real life human-human communication. Second, we conduct a small-scale pilot study to generate online conversations covering a wider range of speech event categories between two humans vs. a human and a state-of-the-art chatbot (i.e., Blender by Facebook). A human evaluation of these generated conversations indicates a preference for human-human conversations, since the human-chatbot conversations lack coherence in most speech event categories. Based on these results, we suggest (a) using the term "small talk" instead of "open-domain" for the current chatbots which are not that "open" in terms of conversational abilities yet, and (b) revising the evaluation methods to test the chatbot conversations against other speech events.

下载PDF全文

下载文献需遵守相关版权规定

论文标题