论文标题
QA方法与数据扩展的多模式对话状态跟踪
Multimodal Dialogue State Tracking By QA Approach with Data Augmentation
论文作者
论文摘要
最近,一项更具挑战性的国家跟踪任务是Audio-Video场景吸引对话(AVSD),在研究人员中引起了越来越多的关注。与纯粹基于文本的对话状态跟踪不同,AVSD中的对话包含一系列有关视频的问题 - 答案对以及给定问题的最终答案,需要对视频进行更多了解。本文从开放域问题回答(QA)的角度解释了AVSD任务,并提出了一个多模式开放式质量域QA系统来解决问题。提出的QA系统使用具有多模式融合和注意力的常见编码器框架。老师强迫用于训练自然语言生成器。我们还建议在质量保证假设下提出一种新的数据增强方法。我们的实验表明,我们的模型和技术比DSTC7-AVSD数据集的基线模型带来了重大改进,并证明了我们数据增强技术的潜力。
Recently, a more challenging state tracking task, Audio-Video Scene-Aware Dialogue (AVSD), is catching an increasing amount of attention among researchers. Different from purely text-based dialogue state tracking, the dialogue in AVSD contains a sequence of question-answer pairs about a video and the final answer to the given question requires additional understanding of the video. This paper interprets the AVSD task from an open-domain Question Answering (QA) point of view and proposes a multimodal open-domain QA system to deal with the problem. The proposed QA system uses common encoder-decoder framework with multimodal fusion and attention. Teacher forcing is applied to train a natural language generator. We also propose a new data augmentation approach specifically under QA assumption. Our experiments show that our model and techniques bring significant improvements over the baseline model on the DSTC7-AVSD dataset and demonstrate the potentials of our data augmentation techniques.