上下文感知基于视频的人重新识别的注意力网络

论文标题

上下文感知基于视频的人重新识别的注意力网络

Context Sensing Attention Network for Video-based Person Re-identification

论文作者

Wang, Kan, Ding, Changxing, Pang, Jianxin, Xu, Xiangmin

论文摘要

由于视频帧中存在各种干扰，基于视频的人重新识别（REID）具有挑战性。最近的方法使用时间聚合策略来解决此问题。在这项工作中，我们提出了一个新颖的环境感应注意网络（CSA-NET），该网络既改进框架特征提取和时间聚集步骤。首先，我们介绍了上下文传感渠道注意（CSCA）模块，该模块强调了每个帧信息渠道的响应。这些信息通道不仅可以参考每个单独的框架，还可以参考整个序列的内容。因此，CSCA探索了序列的每个帧的个性和全局上下文。其次，我们提出了对比特征聚合（CFA）模块，该模块预测了时间聚集的框架权重。在这里，每个帧的重量都以对比方式确定：即，不仅是由每个单独框架的质量，而且还取决于顺序中其他框架的平均质量。因此，它有效地促进了相对良好的框架的贡献。四个数据集的广泛实验结果表明，CSA-NET始终达到最新的性能。

Video-based person re-identification (ReID) is challenging due to the presence of various interferences in video frames. Recent approaches handle this problem using temporal aggregation strategies. In this work, we propose a novel Context Sensing Attention Network (CSA-Net), which improves both the frame feature extraction and temporal aggregation steps. First, we introduce the Context Sensing Channel Attention (CSCA) module, which emphasizes responses from informative channels for each frame. These informative channels are identified with reference not only to each individual frame, but also to the content of the entire sequence. Therefore, CSCA explores both the individuality of each frame and the global context of the sequence. Second, we propose the Contrastive Feature Aggregation (CFA) module, which predicts frame weights for temporal aggregation. Here, the weight for each frame is determined in a contrastive manner: i.e., not only by the quality of each individual frame, but also by the average quality of the other frames in a sequence. Therefore, it effectively promotes the contribution of relatively good frames. Extensive experimental results on four datasets show that CSA-Net consistently achieves state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题