论文标题
部分可观测时空混沌系统的无模型预测
CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog Evaluation
论文作者
论文摘要
实用的对话系统需要处理各种知识源,嘈杂的用户表达方式以及带注释的数据的短缺。为了更好地解决上述问题,我们提出了针对多域目标对话的评估,提出了cgodial,新的挑战性和全面的中国基准评估。它包含96,763个对话框和574,949个对话框,涵盖了具有不同知识源的三个数据集:1)带有表格知识的基于插槽的对话框(SBD)数据集,2)基于流程的对话框(FBD)数据集(FBD)数据集(FBD)数据集,具有树木形成的知识,以及基于Retrieval的Dialog(RBD)数据集(RBD Datagiase)FORS(RBD)FORS(RBD)FORS。为了弥合学术基准和口语对话方案之间的差距,我们要么从真实对话中收集数据,要么通过人群来添加口语功能。拟议的实验设置包括训练与整个训练组或几次训练集的组合,以及使用标准测试集或硬测试子集进行测试,可以根据一般预测,快速的适应性和可靠的鲁棒性来评估模型能力。
Practical dialog systems need to deal with various knowledge sources, noisy user expressions, and the shortage of annotated data. To better solve the above problems, we propose CGoDial, new challenging and comprehensive Chinese benchmark for multi-domain Goal-oriented Dialog evaluation. It contains 96,763 dialog sessions and 574,949 dialog turns totally, covering three datasets with different knowledge sources: 1) a slot-based dialog (SBD) dataset with table-formed knowledge, 2) a flow-based dialog (FBD) dataset with tree-formed knowledge, and a retrieval-based dialog (RBD) dataset with candidate-formed knowledge. To bridge the gap between academic benchmarks and spoken dialog scenarios, we either collect data from real conversations or add spoken features to existing datasets via crowd-sourcing. The proposed experimental settings include the combinations of training with either the entire training set or a few-shot training set, and testing with either the standard test set or a hard test subset, which can assess model capabilities in terms of general prediction, fast adaptability and reliable robustness.