用人类的反馈来增强开放域聊天机器人

论文标题

用人类的反馈来增强开放域聊天机器人

Towards Boosting the Open-Domain Chatbot with Human Feedback

论文作者

Lu, Hua, Bao, Siqi, He, Huang, Wang, Fan, Wu, Hua, Wang, Haifeng

论文摘要

通过社交媒体评论预先训练的许多开放域对话模型都可以产生连贯的答复，但在与真实用户互动时会产生引人入胜的响应。这种现象可能主要是由于人类人类对话的缺乏以及与人类偏爱的不一致而导致的。在本文中，我们提出了一种新颖有效的方法，以提高开放域聊天机器人，其中有两种人类反馈（包括明确的演示和隐性偏好），并利用了。通过要求注释者选择或修改模型生成的候选响应，Diamante有效地收集了人类表现出的响应并构建了中国聊天数据集。为了增强与人类偏好的一致性，Diamante利用数据收集过程的隐式偏好并引入了生成评估的联合培训。全面的实验表明，Diamante数据集和联合培训范式可以显着提高中国预训练的对话模型的性能。

Many open-domain dialogue models pre-trained with social media comments can generate coherent replies but have difficulties producing engaging responses when interacting with real users. This phenomenon might mainly result from the deficiency of annotated human-human conversations and the misalignment with human preference. In this paper, we propose a novel and efficient approach Diamante to boost the open-domain chatbot, where two kinds of human feedback (including explicit demonstration and implicit preference) are collected and leveraged. By asking annotators to select or amend the model-generated candidate responses, Diamante efficiently collects the human demonstrated responses and constructs a Chinese chit-chat dataset. To enhance the alignment with human preference, Diamante leverages the implicit preference in the data collection process and introduces the generation-evaluation joint training. Comprehensive experiments indicate that the Diamante dataset and joint training paradigm can significantly boost the performance of Chinese pre-trained dialogue models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题