CO-VQA：通过交互式子问题序列回答

论文标题

CO-VQA：通过交互式子问题序列回答

Co-VQA : Answering by Interactive Sub Question Sequence

论文作者

Wang, Ruonan, Qian, Yuxi, Feng, Fangxiang, Wang, Xiaojie, Jiang, Huixing

论文摘要

但是，大多数现有的视觉问题回答方法（VQA）直接回答问题，但是，人们通常将一个复杂的问题分解为一系列简单的子问题，并在回答子问题序列（SQS）后最终获得了原始问题的答案。通过模拟该过程，本文提出了一个基于对话的VQA（CO-VQA）框架，该框架由三个组件组成：发问者，甲骨文和答案。发问者使用扩展的HRED模型提出了子问题，Oracle一对一地回答了它们。还提出了一个用于答案器的自适应链视觉推理模型（ACVRM），在该模型中，问题解答对依次用于依次更新视觉表示。为了对每个模型进行监督的学习，我们引入了一种精心设计的方法，以在VQA 2.0和VQA-CP V2数据集上为每个问题构建一个SQS。实验结果表明，我们的方法在VQA-CP V2上实现了最新的方法。进一步的分析表明，SQSS有助于在问题和图像之间建立直接的语义连接，提供问题自适应的可变长度推理链，并具有明确的可解释性以及错误的可追溯性。

Most existing approaches to Visual Question Answering (VQA) answer questions directly, however, people usually decompose a complex question into a sequence of simple sub questions and finally obtain the answer to the original question after answering the sub question sequence(SQS). By simulating the process, this paper proposes a conversation-based VQA (Co-VQA) framework, which consists of three components: Questioner, Oracle, and Answerer. Questioner raises the sub questions using an extending HRED model, and Oracle answers them one-by-one. An Adaptive Chain Visual Reasoning Model (ACVRM) for Answerer is also proposed, where the question-answer pair is used to update the visual representation sequentially. To perform supervised learning for each model, we introduce a well-designed method to build a SQS for each question on VQA 2.0 and VQA-CP v2 datasets. Experimental results show that our method achieves state-of-the-art on VQA-CP v2. Further analyses show that SQSs help build direct semantic connections between questions and images, provide question-adaptive variable-length reasoning chains, and with explicit interpretability as well as error traceability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题