使用部分订单关系进行图像字幕指导注意力

论文标题

使用部分订单关系进行图像字幕指导注意力

Guiding Attention using Partial-Order Relationships for Image Captioning

论文作者

Popattia, Murad, Rafi, Muhammad, Qureshi, Rizwan, Nawaz, Shah

论文摘要

使用注意模型进行自动图像字幕的使用使许多系统能够为图像生成准确而有意义的描述。多年来，已经提出了许多新颖的方法，以使用不同的特征表示来增强注意力过程。在本文中，我们通过创建有指导的注意力网络机制来扩展这种方法，该方法利用图像中的空间特征，主题的高级信息以及字幕生成的时间上下文来利用视觉场景和文本描述之间的关系，这些信息被嵌入到有序的嵌入空间中。成对的排名目标用于训练此嵌入空间，该空间允许在共享语义空间中使用类似的图像，主题和字幕，以在视觉语义层次结构中保持部分顺序，从而帮助模型产生更具视觉上准确的字幕。基于MSCOCO数据集的实验结果显示了我们方法的竞争力，其中许多最新的评估指标模型。

The use of attention models for automated image captioning has enabled many systems to produce accurate and meaningful descriptions for images. Over the years, many novel approaches have been proposed to enhance the attention process using different feature representations. In this paper, we extend this approach by creating a guided attention network mechanism, that exploits the relationship between the visual scene and text-descriptions using spatial features from the image, high-level information from the topics, and temporal context from caption generation, which are embedded together in an ordered embedding space. A pairwise ranking objective is used for training this embedding space which allows similar images, topics and captions in the shared semantic space to maintain a partial order in the visual-semantic hierarchy and hence, helps the model to produce more visually accurate captions. The experimental results based on MSCOCO dataset shows the competitiveness of our approach, with many state-of-the-art models on various evaluation metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题