指示性图像检索：将黑框学习变成灰色

论文标题

指示性图像检索：将黑框学习变成灰色

Indicative Image Retrieval: Turning Blackbox Learning into Grey

论文作者

Zhang, Xulu, Yang, Zhenqun, Tian, Hao, Li, Qing, Wei, Xiaoyong

论文摘要

引入图像后不久，深度学习变成了图像检索的游戏规则改变者。它促进特征提取（通过表示学习）作为图像检索的核心，相关/匹配评估被退化为简单的相似性指标。在许多应用中，我们需要指示匹配的证据，而不仅仅是排名列表（例如，医学图像中目标蛋白/细胞/病变的位置）。就像需要在搜索引擎中突出显示匹配的单词一样。但是，如果没有明确的相关性/匹配建模，这并不容易实现。由于其黑框的性质，深度表示学习模型是不可行的。在本文中，我们以指示性的环境重新审视了深度学习时代相关性/匹配建模的重要性。研究表明，可以直接跳过表示形式学习并建模匹配证据。通过删除对预训练模型的依赖性，它避免了许多相关问题（例如，分类和检索之间的域间隙，由卷积引起的细节扩散等等）。更重要的是，该研究表明，匹配可以被明确建模和回溯以生成匹配的证据指示。它可以提高深度推理的解释性。我们的方法在牛津5K和巴黎6K上获得了文献表现最佳，并在牛津-5K（巴黎6K上为97.81％）创造了97.77％的新记录，而无需提取任何深度特征。

Deep learning became the game changer for image retrieval soon after it was introduced. It promotes the feature extraction (by representation learning) as the core of image retrieval, with the relevance/matching evaluation being degenerated into simple similarity metrics. In many applications, we need the matching evidence to be indicated rather than just have the ranked list (e.g., the locations of the target proteins/cells/lesions in medical images). It is like the matched words need to be highlighted in search engines. However, this is not easy to implement without explicit relevance/matching modeling. The deep representation learning models are not feasible because of their blackbox nature. In this paper, we revisit the importance of relevance/matching modeling in deep learning era with an indicative retrieval setting. The study shows that it is possible to skip the representation learning and model the matching evidence directly. By removing the dependency on the pre-trained models, it has avoided a lot of related issues (e.g., the domain gap between classification and retrieval, the detail-diffusion caused by convolution, and so on). More importantly, the study demonstrates that the matching can be explicitly modeled and backtracked later for generating the matching evidence indications. It can improve the explainability of deep inference. Our method obtains a best performance in literature on both Oxford-5k and Paris-6k, and sets a new record of 97.77% on Oxford-5k (97.81% on Paris-6k) without extracting any deep features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题