基于不确定性的视觉问题回答：估计图像和知识库之间的语义不一致

论文标题

基于不确定性的视觉问题回答：估计图像和知识库之间的语义不一致

Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base

论文作者

Chae, Jinyeong, Kim, Jihie

论文摘要

基于知识的视觉问题回答（KVQA）任务旨在回答需要其他外部知识以及对图像和问题的理解的问题。关于KVQA的最新研究以多模式形式注入外部知识，并且随着更多的知识，可能会添加无关的信息，并且可能会混淆问题的回答。为了正确使用知识，本研究提出了以下内容：1）我们介绍了一种根据标题不确定性和语义相似性计算出的新型语义不一致度量； 2）我们建议一种基于语义不一致度量的新的外部知识同化方法，并将其应用于集成KVQA的明确知识和隐性知识； 3）使用OK-VQA数据集评估所提出的方法并实现最新性能。

Knowledge-based visual question answering (KVQA) task aims to answer questions that require additional external knowledge as well as an understanding of images and questions. Recent studies on KVQA inject an external knowledge in a multi-modal form, and as more knowledge is used, irrelevant information may be added and can confuse the question answering. In order to properly use the knowledge, this study proposes the following: 1) we introduce a novel semantic inconsistency measure computed from caption uncertainty and semantic similarity; 2) we suggest a new external knowledge assimilation method based on the semantic inconsistency measure and apply it to integrate explicit knowledge and implicit knowledge for KVQA; 3) the proposed method is evaluated with the OK-VQA dataset and achieves the state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题