自主系统中的对象识别的生物学启发的视觉系统体系结构

论文标题

自主系统中的对象识别的生物学启发的视觉系统体系结构

Biologically Inspired Visual System Architecture for Object Recognition in Autonomous Systems

论文作者

Malowany, Dan, Guterman, Hugo

论文摘要

近年来，关于卷积神经网络对添加噪声，光条件以及对培训数据集的整体感的敏感性的发现表明，这项技术仍然缺乏自主机器人行业所需的鲁棒性。为了使计算机视觉算法更接近人类操作员的功能，在这项工作中分析了人类视觉系统的机制。最近的研究表明，人脑识别过程背后的机制包括基于对世界的先验知识的连续产生预测。这些预测能够快速生成上下文假设，这些假设偏向识别过程的结果。当视觉输入模棱两可时，在不确定性情况下，这种机制尤其有利。此外，人类视觉系统根据其预测与视觉反馈之间的差距不断更新有关世界的知识。卷积神经网络本质上是馈送的，缺乏这种自上而下的上下文衰减机制。结果，尽管它们在操作过程中处理了大量的视觉信息，但这些信息并未转化为可用于生成上下文预测并提高其性能的知识。在这项工作中，设计了一个架构，旨在将人类视觉系统自上而下的预测和学习过程背后的概念与最新的自下而上的对象识别模型（例如深度卷积神经网络）相结合。该作品着重于人类视觉系统的两种机制：预期驱动的感知和增强驱动的学习。模仿这些自上而下的机制，以及最新的自下而上的进料算法，导致了准确，健壮且不断改进的目标识别模型。

Findings in recent years on the sensitivity of convolutional neural networks to additive noise, light conditions and to the wholeness of the training dataset, indicate that this technology still lacks the robustness needed for the autonomous robotic industry. In an attempt to bring computer vision algorithms closer to the capabilities of a human operator, the mechanisms of the human visual system was analyzed in this work. Recent studies show that the mechanisms behind the recognition process in the human brain include continuous generation of predictions based on prior knowledge of the world. These predictions enable rapid generation of contextual hypotheses that bias the outcome of the recognition process. This mechanism is especially advantageous in situations of uncertainty, when visual input is ambiguous. In addition, the human visual system continuously updates its knowledge about the world based on the gaps between its prediction and the visual feedback. Convolutional neural networks are feed forward in nature and lack such top-down contextual attenuation mechanisms. As a result, although they process massive amounts of visual information during their operation, the information is not transformed into knowledge that can be used to generate contextual predictions and improve their performance. In this work, an architecture was designed that aims to integrate the concepts behind the top-down prediction and learning processes of the human visual system with the state of the art bottom-up object recognition models, e.g., deep convolutional neural networks. The work focuses on two mechanisms of the human visual system: anticipation-driven perception and reinforcement-driven learning. Imitating these top-down mechanisms, together with the state of the art bottom-up feed-forward algorithms, resulted in an accurate, robust, and continuously improving target recognition model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题