论文标题

部分可观测时空混沌系统的无模型预测

Active Gaze Control for Foveal Scene Exploration

论文作者

Dias, Alexandre M. F., Simões, Luís, Moreno, Plinio, Bernardino, Alexandre

论文摘要

主动的感知和中央凹视觉是人类视觉系统的基础。虽然动脉视觉减少了凝视期间要处理的信息的量,但主动感知会将凝视方向转变为视野中最有希望的部分。我们提出了一种方法,以模仿人类和机器人使用中央摄像机探索场景,并以最少的凝视转移来识别周围环境中存在的物体。我们的方法基于三种关键方法。首先,我们采用现成的深度对象检测器,并在大量的常规图像数据集上进行了预训练,并将分类输出校准为foveateat图像的情况。其次,考虑了几种数据融合技术,对对象分类和相应的不确定性编码对象进行分类和相应的不确定性进行了依次更新。第三,下一个最佳目光固定点是基于信息理论指标确定的,旨在最大程度地减少语义图的总体预期不确定性。与接下来的凝视移动的随机选择相比,所提出的方法可以使检测的F1得分增加到相同数量的凝视偏移的2-3个百分点,并减少到三分之一的三分之一,以达到所需的凝视转移数量,以达到相似的性能。

Active perception and foveal vision are the foundations of the human visual system. While foveal vision reduces the amount of information to process during a gaze fixation, active perception will change the gaze direction to the most promising parts of the visual field. We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene, identifying the objects present in their surroundings with in least number of gaze shifts. Our approach is based on three key methods. First, we take an off-the-shelf deep object detector, pre-trained on a large dataset of regular images, and calibrate the classification outputs to the case of foveated images. Second, a body-centered semantic map, encoding the objects classifications and corresponding uncertainties, is sequentially updated with the calibrated detections, considering several data fusion techniques. Third, the next best gaze fixation point is determined based on information-theoretic metrics that aim at minimizing the overall expected uncertainty of the semantic map. When compared to the random selection of next gaze shifts, the proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts and reduces to one third the number of required gaze shifts to attain similar performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源