通过诱导符号空间来解释的视觉推理

论文标题

通过诱导符号空间来解释的视觉推理

Interpretable Visual Reasoning via Induced Symbolic Space

论文作者

Wang, Zhonghao, Wang, Kai, Yu, Mo, Xiong, Jinjun, Hwu, Wen-mei, Hasegawa-Johnson, Mark, Shi, Humphrey

论文摘要

我们研究视觉推理中概念诱导的问题，即，从与图像相关的问答对识别概念及其层次关系；并通过在诱发符号概念空间上工作实现可解释的模型。为此，我们首先设计了一个名为中心的以对象组成注意模型（OCCAM）的新框架，以使用对象级的视觉特征执行视觉推理任务。然后，我们提出了一种方法，使用来自对象的视觉特征和问题单词之间的注意力模式的线索诱导对象和关系的概念。最后，我们通过将OCCAM施加在诱导的符号概念空间中代表的对象上，从而实现了更高的可解释性水平。我们的模型设计使这可以通过首先预测对象和关系的概念，然后将预测的概念投射回视觉特征空间，从而使其变得容易适应，从而将预测的概念投射回视觉特征空间，从而使组成推理模块可以正常处理。 CLEVR和GQA数据集上的实验证明：1）我们的OCCAM在没有人类注销的功能程序的情况下实现了新的最新状态； 2）我们的诱导概念既准确又足够，因为OCCAM在视觉特征或诱导的符号概念空间中所表示的对象上实现了PAR性能。

We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images; and achieve an interpretable model via working on the induced symbolic concept space. To this end, we first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features. Then, we come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words. Finally, we achieve a higher level of interpretability by imposing OCCAM on the objects represented in the induced symbolic concept space. Our model design makes this an easy adaption via first predicting the concepts of objects and relations and then projecting the predicted concepts back to the visual feature space so the compositional reasoning module can process normally. Experiments on the CLEVR and GQA datasets demonstrate: 1) our OCCAM achieves a new state of the art without human-annotated functional programs; 2) our induced concepts are both accurate and sufficient as OCCAM achieves an on-par performance on objects represented either in visual features or in the induced symbolic concept space.

下载PDF全文

下载文献需遵守相关版权规定

论文标题