论文标题

CO-VQA:通过交互式子问题序列回答

Co-VQA : Answering by Interactive Sub Question Sequence

论文作者

Wang, Ruonan, Qian, Yuxi, Feng, Fangxiang, Wang, Xiaojie, Jiang, Huixing

论文摘要

但是,大多数现有的视觉问题回答方法(VQA)直接回答问题,但是,人们通常将一个复杂的问题分解为一系列简单的子问题,并在回答子问题序列(SQS)后最终获得了原始问题的答案。通过模拟该过程,本文提出了一个基于对话的VQA(CO-VQA)框架,该框架由三个组件组成:发问者,甲骨文和答案。发问者使用扩展的HRED模型提出了子问题,Oracle一对一地回答了它们。还提出了一个用于答案器的自适应链视觉推理模型(ACVRM),在该模型中,问题解答对依次用于依次更新视觉表示。为了对每个模型进行监督的学习,我们引入了一种精心设计的方法,以在VQA 2.0和VQA-CP V2数据集上为每个问题构建一个SQS。实验结果表明,我们的方法在VQA-CP V2上实现了最新的方法。进一步的分析表明,SQSS有助于在问题和图像之间建立直接的语义连接,提供问题自适应的可变长度推理链,并具有明确的可解释性以及错误的可追溯性。

Most existing approaches to Visual Question Answering (VQA) answer questions directly, however, people usually decompose a complex question into a sequence of simple sub questions and finally obtain the answer to the original question after answering the sub question sequence(SQS). By simulating the process, this paper proposes a conversation-based VQA (Co-VQA) framework, which consists of three components: Questioner, Oracle, and Answerer. Questioner raises the sub questions using an extending HRED model, and Oracle answers them one-by-one. An Adaptive Chain Visual Reasoning Model (ACVRM) for Answerer is also proposed, where the question-answer pair is used to update the visual representation sequentially. To perform supervised learning for each model, we introduce a well-designed method to build a SQS for each question on VQA 2.0 and VQA-CP v2 datasets. Experimental results show that our method achieves state-of-the-art on VQA-CP v2. Further analyses show that SQSs help build direct semantic connections between questions and images, provide question-adaptive variable-length reasoning chains, and with explicit interpretability as well as error traceability.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源