论文标题

实体增强的自适应重建网络,用于弱监督的表达接地

Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

论文作者

Liu, Xuejing, Li, Liang, Wang, Shuhui, Zha, Zheng-Jun, Li, Zechao, Tian, Qi, Huang, Qingming

论文摘要

弱监督的参考表达接地(REG)旨在将特定目标扎根于语言表达所描述的图像中,同时缺乏目标和表达之间的对应关系。弱监督的REG存在两个主要问题。首先,缺乏区域级注释会引入建议和查询之间的歧义。其次,大多数以前的弱监督的REG方法忽略了参考物的判别位置和上下文,从而在将目标与其他相同类别对象区分开时造成了困难。为了应对上述挑战,我们设计了实体增强的自适应重建网络(enar)。具体而言,赚取包括三个模块:实体增强,自适应接地和协作重建。在实体增强中,我们计算语义相似性作为监督以选择候选建议。自适应接地计算主题,位置和背景下的候选提案的排名评分,并以等级的关注。协作重建从三个角度衡量的排名结果:自适应重建,语言重建和属性分类。自适应机制有助于减轻不同参考表达式的差异。五个数据集的实验显示,赚取胜于现有的最新方法。定性结果表明,提议的收入可以更好地处理特定类别的多个对象在一起的情况。

Weakly supervised Referring Expression Grounding (REG) aims to ground a particular target in an image described by a language expression while lacking the correspondence between target and expression. Two main problems exist in weakly supervised REG. First, the lack of region-level annotations introduces ambiguities between proposals and queries. Second, most previous weakly supervised REG methods ignore the discriminative location and context of the referent, causing difficulties in distinguishing the target from other same-category objects. To address the above challenges, we design an entity-enhanced adaptive reconstruction network (EARN). Specifically, EARN includes three modules: entity enhancement, adaptive grounding, and collaborative reconstruction. In entity enhancement, we calculate semantic similarity as supervision to select the candidate proposals. Adaptive grounding calculates the ranking score of candidate proposals upon subject, location and context with hierarchical attention. Collaborative reconstruction measures the ranking result from three perspectives: adaptive reconstruction, language reconstruction and attribute classification. The adaptive mechanism helps to alleviate the variance of different referring expressions. Experiments on five datasets show EARN outperforms existing state-of-the-art methods. Qualitative results demonstrate that the proposed EARN can better handle the situation where multiple objects of a particular category are situated together.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源