Goal Object Grounding and Multimodal Mapping for Multi-object Visual Navigation

被引:0
|
作者
Choi J. [1 ]
Kim I. [1 ]
机构
[1] Department of Computer Science, Kyonggi University
关键词
deep reinforcement learning; global mapping; goal grounding; multi-object visual navigation; reward function;
D O I
10.5302/J.ICROS.2024.23.0217
中图分类号
学科分类号
摘要
Multi-object visual navigation (MultiON) is a special type of visual navigation task that requires an embodied agent to visit multiple goal objects distributed over an unseen three-dimensional (3D) environment in a predefined order. To successfully execute MultiON, an agent should be able to accurately ground individual goal objects based on language descriptions regarding their color and shape attributes and build a semantically rich map that effectively covers the entire environment. In this paper, we propose a novel deep neural network-based agent model for performing MultiON tasks. The proposed model provides unique solutions to three different issues regarding MultiON agent design. First, the model adopts the pre-trained Grounding DINO module to ground the language descriptions of goal objects to the visual objects in input images in a zero-shot manner. Moreover, the model uses Bayesian posterior probabilities to effectively register the uncertain local contexts extracted from input images onto the global map. Finally, the model applies a novel reward function to efficiently motivate the agent to explore unvisited areas in the given environment for rapid and accurate map expansion. We demonstrate the superiority of the proposed model by conducting various quantitative and qualitative experiments using the 3D simulation platform, AI-Habitat, and the benchmark scene dataset, Matterport3D. © ICROS 2024.
引用
收藏
页码:596 / 606
页数:10
相关论文
共 50 条
  • [21] Teaching Agents how to Map: Spatial Reasoning for Multi-Object Navigation
    Marza, Pierre
    Matignon, Laetitia
    Simonin, Olivier
    Wolf, Christian
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 1725 - 1732
  • [22] Multi-Object Navigation Using Potential Target Position Policy Function
    Zeng, Haitao
    Song, Xinhang
    Jiang, Shuqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2608 - 2619
  • [23] Skill Fusion in Hybrid Robotic Framework for Visual Object Goal Navigation
    Staroverov, Aleksei
    Muravyev, Kirill
    Yakovlev, Konstantin
    Panov, Aleksandr I.
    ROBOTICS, 2023, 12 (04)
  • [24] Aligning Knowledge Graph with Visual Perception for Object-goal Navigation
    Xu, Nuo
    Wang, Wen
    Yang, Rong
    Qin, Mengjie
    Lin, Zheyuan
    Song, Wei
    Zhang, Chunlong
    Gu, Jason
    Li, Chao
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 5214 - 5220
  • [25] Extending IOU Based Multi-Object Tracking by Visual Information
    Bochinski, Erik
    Senst, Tobias
    Sikora, Thomas
    2018 15TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2018, : 435 - 440
  • [26] Height estimation algorithm based on visual multi-object tracking
    School of Information and Communication Engineering, Dalian University of Technology, Dalian
    Liaoning
    116024, China
    不详
    Liaoning
    116600, China
    Tien Tzu Hsueh Pao, 3 (591-596):
  • [27] MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation
    Wani, Saim
    Patel, Shivansh
    Jain, Unnat
    Chang, Angel X.
    Savva, Manolis
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [28] Multi-object Visual Tracking for Indoor Images of Retail Consumers
    Panagos, Iason-Ioannis
    Giotis, Angelos P.
    Nikou, Christophoros
    2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2022,
  • [29] Multi-object tracking with robust object regression and association
    Li, Yi-Fan
    Ji, Hong-Bing
    Chen, Xi
    Lai, Yu-Kun
    Yang, Yong-Liang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 227
  • [30] INTERACTIVE MULTI-OBJECT TRACKING FOR VIRTUAL OBJECT MANIPULATION
    Guo, Yibo
    Yang, Michael Ying
    Rosenhahn, Bodo
    ISA13 - THE ISPRS WORKSHOP ON IMAGE SEQUENCE ANALYSIS 2013, 2013, II-3/W2 : 19 - 24