论文标题

时间句子接地的记忆指导的语义学习网络

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding

论文作者

Liu, Daizong, Qu, Xiaoye, Di, Xing, Cheng, Yu, Xu, Zichuan, Zhou, Pan

论文摘要

时间句子接地(TSG)至关重要,对于视频理解至关重要。尽管现有方法训练了大量数据训练精心设计的深层网络,但我们发现,由于失衡数据分布,它们很容易忘记训练阶段中很少出现的情况,这会影响模型的概括并导致不良性能。为了解决此问题,我们提出了一个名为内存的网络,称为内存引导的语义学习网络(MGSL-NET),该网络学习并记住TSG任务中鲜出现的内容。具体而言,MGSL-NET由三个主要部分组成:一个跨模式的互动模块,存储器增强模块和一个异质注意模块。我们首先通过跨模式图卷积网络对准给定的视频疑问对,然后利用内存模块记录特定于域特定的持久内存中的跨模式共享语义特征。在训练过程中,记忆插槽与常见和罕见情况都动态相关,从而减轻了遗忘问题。在测试中,可以通过检索存储的记忆来增强罕见情况,从而更好地泛化。最后,将异质注意模块用于整合视频和查询域中增强的多模式特征。三个基准测试的实验结果表明我们方法对有效性和效率都具有优势,这不仅在整个数据集上而且在极少数情况下都提高了准确性。

Temporal sentence grounding (TSG) is crucial and fundamental for video understanding. Although the existing methods train well-designed deep networks with a large amount of data, we find that they can easily forget the rarely appeared cases in the training stage due to the off-balance data distribution, which influences the model generalization and leads to undesirable performance. To tackle this issue, we propose a memory-augmented network, called Memory-Guided Semantic Learning Network (MGSL-Net), that learns and memorizes the rarely appeared content in TSG tasks. Specifically, MGSL-Net consists of three main parts: a cross-modal inter-action module, a memory augmentation module, and a heterogeneous attention module. We first align the given video-query pair by a cross-modal graph convolutional network, and then utilize a memory module to record the cross-modal shared semantic features in the domain-specific persistent memory. During training, the memory slots are dynamically associated with both common and rare cases, alleviating the forgetting issue. In testing, the rare cases can thus be enhanced by retrieving the stored memories, resulting in better generalization. At last, the heterogeneous attention module is utilized to integrate the enhanced multi-modal features in both video and query domains. Experimental results on three benchmarks show the superiority of our method on both effectiveness and efficiency, which substantially improves the accuracy not only on the entire dataset but also on rare cases.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源