论文标题
MEMONAV:选择视觉导航的信息记忆
MemoNav: Selecting Informative Memories for Visual Navigation
论文作者
论文摘要
图像目标导航是一项具有挑战性的任务,因为它要求代理必须导航到以前看不见的场景中图像指示的目标。当前方法介绍了各种存储机制,从而保存导航历史记录以解决此任务。但是,这些方法使用内存中的所有观察值来生成导航操作,而无需考虑该内存的哪一部分是有益的。为了解决这一限制,我们提出了Memonav,这是一种用于图像目标导航的新型内存机制,该机制保留了代理商的短期记忆和长期记忆,以改善多进球任务上的导航性能。代理拓扑图上的节点特征存储在短期内存中,因为这些功能是动态更新的。为了帮助短期内存,我们还通过图形注意模块连续汇总短期内存来生成长期记忆。 MEMONAV通过基于变压器解码器的遗忘模块保留了短期内存的信息部分,然后将此保留的短期内存和长期内存结合到工作记忆中。最后,代理使用工作内存进行动作生成。我们在新的多进球导航数据集上评估了我们的模型。实验结果表明,MEMONAV的表现优于SOTA方法,而导航历史悠久的比例较小。从经验上看,结果还表明,我们的模型不太可能被困在僵局中,这进一步验证了Memonav通过降低冗余步骤来提高代理商的导航效率。
Image-goal navigation is a challenging task, as it requires the agent to navigate to a target indicated by an image in a previously unseen scene. Current methods introduce diverse memory mechanisms which save navigation history to solve this task. However, these methods use all observations in the memory for generating navigation actions without considering which fraction of this memory is informative. To address this limitation, we present the MemoNav, a novel memory mechanism for image-goal navigation, which retains the agent's informative short-term memory and long-term memory to improve the navigation performance on a multi-goal task. The node features on the agent's topological map are stored in the short-term memory, as these features are dynamically updated. To aid the short-term memory, we also generate long-term memory by continuously aggregating the short-term memory via a graph attention module. The MemoNav retains the informative fraction of the short-term memory via a forgetting module based on a Transformer decoder and then incorporates this retained short-term memory and the long-term memory into working memory. Lastly, the agent uses the working memory for action generation. We evaluate our model on a new multi-goal navigation dataset. The experimental results show that the MemoNav outperforms the SoTA methods by a large margin with a smaller fraction of navigation history. The results also empirically show that our model is less likely to be trapped in a deadlock, which further validates that the MemoNav improves the agent's navigation efficiency by reducing redundant steps.