QA方法与数据扩展的多模式对话状态跟踪

论文标题

QA方法与数据扩展的多模式对话状态跟踪

Multimodal Dialogue State Tracking By QA Approach with Data Augmentation

论文作者

Mou, Xiangyang, Sigouin, Brandyn, Steenstra, Ian, Su, Hui

论文摘要

最近，一项更具挑战性的国家跟踪任务是Audio-Video场景吸引对话（AVSD），在研究人员中引起了越来越多的关注。与纯粹基于文本的对话状态跟踪不同，AVSD中的对话包含一系列有关视频的问题 - 答案对以及给定问题的最终答案，需要对视频进行更多了解。本文从开放域问题回答（QA）的角度解释了AVSD任务，并提出了一个多模式开放式质量域QA系统来解决问题。提出的QA系统使用具有多模式融合和注意力的常见编码器框架。老师强迫用于训练自然语言生成器。我们还建议在质量保证假设下提出一种新的数据增强方法。我们的实验表明，我们的模型和技术比DSTC7-AVSD数据集的基线模型带来了重大改进，并证明了我们数据增强技术的潜力。

Recently, a more challenging state tracking task, Audio-Video Scene-Aware Dialogue (AVSD), is catching an increasing amount of attention among researchers. Different from purely text-based dialogue state tracking, the dialogue in AVSD contains a sequence of question-answer pairs about a video and the final answer to the given question requires additional understanding of the video. This paper interprets the AVSD task from an open-domain Question Answering (QA) point of view and proposes a multimodal open-domain QA system to deal with the problem. The proposed QA system uses common encoder-decoder framework with multimodal fusion and attention. Teacher forcing is applied to train a natural language generator. We also propose a new data augmentation approach specifically under QA assumption. Our experiments show that our model and techniques bring significant improvements over the baseline model on the DSTC7-AVSD dataset and demonstrate the potentials of our data augmentation techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题