深层贝叶斯网络用于视觉问题的生成

论文标题

深层贝叶斯网络用于视觉问题的生成

Deep Bayesian Network for Visual Question Generation

论文作者

Patro, Badri N., Kurmi, Vinod K., Kumar, Sandeep, Namboodiri, Vinay P.

论文摘要

从图像中产生自然问题是一项语义任务，需要使用视觉和语言方式来学习多模式表示。图像可以具有多个视觉和语言提示，例如位置，字幕和标签。在本文中，我们提出了一个原则上的深贝叶斯学习框架，该框架结合了这些线索以产生自然问题。我们观察到，通过增加更多的提示并通过最大程度地减少线索中的不确定性，贝叶斯网络变得更加自信。我们提出了最小化提示（MUMC）混合物的不确定性，从而最大程度地减少了提示专家混合物中存在的不确定性，以产生概率问题。这是一个贝叶斯框架，结果与人类研究验证的自然问题相似。我们观察到，通过增加更多的提示并通过最大程度地减少提示中的不确定性，贝叶斯框架变得更加自信。对我们的模型的消融研究表明，在此任务下，线索的一部分较低，因此优先考虑提示的原则融合。此外，我们观察到，所提出的方法对定量指标的最先进基准（Bleu-N，Meteor，Rouge和Cider）有了显着改善。在这里，我们提供了深贝叶斯VQG \ url {https://delta-lab-iitk.github.io/bvqg/}的项目链接

Generating natural questions from an image is a semantic task that requires using vision and language modalities to learn multimodal representations. Images can have multiple visual and language cues such as places, captions, and tags. In this paper, we propose a principled deep Bayesian learning framework that combines these cues to produce natural questions. We observe that with the addition of more cues and by minimizing uncertainty in the among cues, the Bayesian network becomes more confident. We propose a Minimizing Uncertainty of Mixture of Cues (MUMC), that minimizes uncertainty present in a mixture of cues experts for generating probabilistic questions. This is a Bayesian framework and the results show a remarkable similarity to natural questions as validated by a human study. We observe that with the addition of more cues and by minimizing uncertainty among the cues, the Bayesian framework becomes more confident. Ablation studies of our model indicate that a subset of cues is inferior at this task and hence the principled fusion of cues is preferred. Further, we observe that the proposed approach substantially improves over state-of-the-art benchmarks on the quantitative metrics (BLEU-n, METEOR, ROUGE, and CIDEr). Here we provide project link for Deep Bayesian VQG \url{https://delta-lab-iitk.github.io/BVQG/}

下载PDF全文

下载文献需遵守相关版权规定

论文标题