论文标题
重新思考图像显着对象检测:对象级语义显着性首先重新排列,Pixel显着性完善后。
Rethinking of the Image Salient Object Detection: Object-level Semantic Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter
论文作者
论文摘要
真正的人类注意力是使用低级视觉刺激和高级语义信息之间的视觉系统与大脑之间的互动活动。先前的图像显着对象检测(SOD)作品以多任务方式进行显着预测,即,同时进行像素的显着性回归和分割样显着性的完善,这使他们的特征骨干在揭示语义信息时退化。但是,鉴于图像,我们倾向于更多地关注那些在语义上显着的区域,即使这些区域在感知上并不是最显着的区域。在本文中,我们将SOD问题分为两个顺序的任务:1)我们提出了一个轻巧的,弱监督的深层网络,首先先将这些语义显着区域定位; 2)然后,作为一种后处理程序,我们有选择地将这些语义显着区域的多个现成的深层模型融合在一起,作为像素显着性的完善。与最新的(SOTA)方法形成鲜明对比的是,使用感知线索专注于学习像素的显着性,我们的方法研究了“多个图像之间的对象级语义等级”,其中该方法与真实人类注意机制更加一致。我们的方法很简单却有效,这是首次将显着对象检测视为对象级别的语义重新排列问题的尝试。
The real human attention is an interactive activity between our visual system and our brain, using both low-level visual stimulus and high-level semantic information. Previous image salient object detection (SOD) works conduct their saliency predictions in a multi-task manner, i.e., performing pixel-wise saliency regression and segmentation-like saliency refinement at the same time, which degenerates their feature backbones in revealing semantic information. However, given an image, we tend to pay more attention to those regions which are semantically salient even in the case that these regions are perceptually not the most salient ones at first glance. In this paper, we divide the SOD problem into two sequential tasks: 1) we propose a lightweight, weakly supervised deep network to coarsely locate those semantically salient regions first; 2) then, as a post-processing procedure, we selectively fuse multiple off-the-shelf deep models on these semantically salient regions as the pixel-wise saliency refinement. In sharp contrast to the state-of-the-art (SOTA) methods that focus on learning pixel-wise saliency in "single image" using perceptual clues mainly, our method has investigated the "object-level semantic ranks between multiple images", of which the methodology is more consistent with the real human attention mechanism. Our method is simple yet effective, which is the first attempt to consider the salient object detection mainly as an object-level semantic re-ranking problem.