利用每个图像对象进行半监督的短语接地

论文标题

利用每个图像对象进行半监督的短语接地

Utilizing Every Image Object for Semi-supervised Phrase Grounding

论文作者

Zhu, Haidong, Sadhu, Arka, Zheng, Zhaoheng, Nevatia, Ram

论文摘要

短语接地模型将对象定位在图像中，给定表达式。培训期间可用的注释语言查询有限，这也限制了模型在培训过程中可以看到的语言组合的变化。在本文中，我们研究了应用物体的情况，而没有标记查询以训练半监督的短语接地。我们建议使用学到的位置和主题嵌入预测变量（LSEP）来生成相应的语言嵌入，以用于培训集中缺乏带注释查询的对象。在检测器的帮助下，我们还使用LSEP在没有任何注释的情况下对图像进行训练。我们在三个公共数据集上基于mattnet评估了我们的方法：refcoco，refcoco+和reccocog。我们表明，我们的预测因子允许接地系统在没有标记的查询的情况下从对象中学习，并在检测结果中相对提高34.9 \％。

Phrase grounding models localize an object in the image given a referring expression. The annotated language queries available during training are limited, which also limits the variations of language combinations that a model can see during training. In this paper, we study the case applying objects without labeled queries for training the semi-supervised phrase grounding. We propose to use learned location and subject embedding predictors (LSEP) to generate the corresponding language embeddings for objects lacking annotated queries in the training set. With the assistance of the detector, we also apply LSEP to train a grounding model on images without any annotation. We evaluate our method based on MAttNet on three public datasets: RefCOCO, RefCOCO+, and RefCOCOg. We show that our predictors allow the grounding system to learn from the objects without labeled queries and improve accuracy by 34.9\% relatively with the detection results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题