SD-RSIC：汇总驱动的深度遥感图像字幕

论文标题

SD-RSIC：汇总驱动的深度遥感图像字幕

SD-RSIC: Summarization Driven Deep Remote Sensing Image Captioning

论文作者

Sumbul, Gencer, Nayak, Sonali, Demir, Begüm

论文摘要

最近发现，深度神经网络（DNN）在遥感（RS）中的图像字幕问题（RS）中很受欢迎。现有的基于DNN的方法取决于由带有标题的大量RS图像组成的训练设置的可用性。但是，训练图像的标题可能包含冗余信息（它们可以重复或语义上相似），从而导致信息缺乏，同时学习从图像域到语言域的映射。为了克服这一限制，在本文中，我们提出了一种新颖的摘要驱动遥感图像字幕（SD-RSIC）方法。提出的方法包括三个主要步骤。第一步通过与长期记忆（LSTM）网络共同利用卷积神经网络（CNN）来获得标准图像标题。与现有的RS图像字幕方法不同，第二步将每个训练图像的基本真实标题汇总为单个字幕，通过将序列利用为序列神经网络并消除训练集中存在的冗余。第三步自动定义了与每个RS图像关联的自适应权重，以将标准字幕与基于图像的语义内容的汇总字幕相结合。这是通过在LSTM网络中定义的新型自适应加权策略来实现的。与最先进的RS图像字幕相比，在RSCID，UCM捕获和悉尼捕获数据集上获得的实验结果显示了所提出的方法的有效性。该方法的代码可在https://gitlab.tubit.tu-berlin.de/rsim/sd-rsic上公开获得。

Deep neural networks (DNNs) have been recently found popular for image captioning problems in remote sensing (RS). Existing DNN based approaches rely on the availability of a training set made up of a high number of RS images with their captions. However, captions of training images may contain redundant information (they can be repetitive or semantically similar to each other), resulting in information deficiency while learning a mapping from the image domain to the language domain. To overcome this limitation, in this paper, we present a novel Summarization Driven Remote Sensing Image Captioning (SD-RSIC) approach. The proposed approach consists of three main steps. The first step obtains the standard image captions by jointly exploiting convolutional neural networks (CNNs) with long short-term memory (LSTM) networks. The second step, unlike the existing RS image captioning methods, summarizes the ground-truth captions of each training image into a single caption by exploiting sequence to sequence neural networks and eliminates the redundancy present in the training set. The third step automatically defines the adaptive weights associated to each RS image to combine the standard captions with the summarized captions based on the semantic content of the image. This is achieved by a novel adaptive weighting strategy defined in the context of LSTM networks. Experimental results obtained on the RSCID, UCM-Captions and Sydney-Captions datasets show the effectiveness of the proposed approach compared to the state-of-the-art RS image captioning approaches. The code of the proposed approach is publicly available at https://gitlab.tubit.tu-berlin.de/rsim/SD-RSIC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题