Gotonet：快速的单眼场景曝光和探索

论文标题

Gotonet：快速的单眼场景曝光和探索

GoToNet: Fast Monocular Scene Exposure and Exploration

论文作者

Avrech, Tom, Zheltonozhskii, Evgenii, Baskin, Chaim, Rivlin, Ehud

论文摘要

自主场景的曝光和探索，尤其是在本地化或沟通有限的区域（对于在未知场景中找到目标）有用的问题仍然是一个挑战性的问题。在这项工作中，我们提出了一种用于实时环境探索的新方法，该方法的唯一要求是一个可视化的数据集，用于预训练，场景中足够的照明以及用于环境感应的机上前瞻性RGB摄像机。与现有方法相反，我们的方法只需要一个外观（图像）才能做出一个良好的战术决定，因此在非成长，恒定的时间内起作用。两个方向的预测以像素为特征的被称为goto和lookat像素，构成了我们方法的核心。这些像素通过以下方式编码建议的飞行说明：goto像素定义了代理应以一个距离单位移动的方向，而Lookat像素定义了相机应在下一步中指向的方向。这些飞行的指导像素经过优化，以揭示目前未开发的最多探索区域。我们的方法提出了一种基于深度学习的新型导航方法，能够解决此问题并在更复杂的设置中证明其能力，即计算能力有限。此外，我们提出了一种生成面向导航的数据集的方法，从而可以使用RGB和深度图像对我们的方法有效培训。在模拟器中进行的测试，评估了稀疏像素的推断过程的协调，以及旨在揭示距离区域并降低目标的2D和3D测试飞行实现了有希望的结果。与最先进的算法的比较表明，我们的方法能够表现出色，在测量每个相机姿势的新素素，最小距离目标距离，看到的表面体素的百分比和计算时间指标。

Autonomous scene exposure and exploration, especially in localization or communication-denied areas, useful for finding targets in unknown scenes, remains a challenging problem in computer navigation. In this work, we present a novel method for real-time environment exploration, whose only requirements are a visually similar dataset for pre-training, enough lighting in the scene, and an on-board forward-looking RGB camera for environmental sensing. As opposed to existing methods, our method requires only one look (image) to make a good tactical decision, and therefore works at a non-growing, constant time. Two direction predictions, characterized by pixels dubbed the Goto and Lookat pixels, comprise the core of our method. These pixels encode the recommended flight instructions in the following way: the Goto pixel defines the direction in which the agent should move by one distance unit, and the Lookat pixel defines the direction in which the camera should be pointing at in the next step. These flying-instruction pixels are optimized to expose the largest amount of currently unexplored areas. Our method presents a novel deep learning-based navigation approach that is able to solve this problem and demonstrate its ability in an even more complicated setup, i.e., when computational power is limited. In addition, we propose a way to generate a navigation-oriented dataset, enabling efficient training of our method using RGB and depth images. Tests conducted in a simulator evaluating both the sparse pixels' coordinations inferring process, and 2D and 3D test flights aimed to unveil areas and decrease distances to targets achieve promising results. Comparison against a state-of-the-art algorithm shows our method is able to overperform it, that while measuring the new voxels per camera pose, minimum distance to target, percentage of surface voxels seen, and compute time metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题