L3MVN: Leveraging Large Language Models for Visual Target Navigation

被引:10
|
作者
Yu, Bangguo [1 ]
Kasaei, Hamidreza [1 ]
Cao, Ming [1 ]
机构
[1] Univ Groningen, Fac Sci & Engn, Ne, NL-9747 AG Groningen, Netherlands
关键词
D O I
10.1109/IROS55552.2023.10342512
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual target navigation in unknown environments is a crucial problem in robotics. Despite extensive investigation of classical and learning-based approaches in the past, robots lack common-sense knowledge about household objects and layouts. Prior state-of-the-art approaches to this task rely on learning the priors during the training and typically require significant expensive resources and time for learning. To address this, we propose a new framework for visual target navigation that leverages Large Language Models (LLM) to impart common sense for object searching. Specifically, we introduce two paradigms: (i) zero-shot and (ii) feed-forward approaches that use language to find the relevant frontier from the semantic map as a long-term goal and explore the environment efficiently. Our analyse demonstrates the notable zero-shot generalization and transfer capabilities from the use of language. Experiments on Gibson and Habitat-Matterport 3D (HM3D) demonstrate that the proposed framework significantly outperforms existing map-based methods in terms of success rate and generalization. Ablation analyse also indicates that the common-sense knowledge from the language model leads to more efficient semantic exploration. Finally, we provide a real robot experiment to verify the applicability of our framework in real-world scenarios. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/l3mvn.
引用
收藏
页码:3554 / 3560
页数:7
相关论文
共 50 条
  • [41] Leveraging Large Language Models to Improve REST API Testing
    Kim, Myeongsoo
    Stennett, Tyler
    Shah, Dhruv
    Sinha, Saurabh
    Orso, Alessandro
    2024 IEEE/ACM 46TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: NEW IDEAS AND EMERGING RESULTS, ICSE-NIER 2024, 2024, : 37 - 41
  • [42] Leveraging Large Language Models for the Automated Documentation of Hardware Designs
    Fernando, Saruni
    Kunzelmann, Robert
    Lopera, Daniela Sanchez
    Al Halabi, Jad
    Ecker, Wolfgang
    2024 13TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING, MECO 2024, 2024, : 165 - 170
  • [43] Correction: Leveraging large language models for word sense disambiguation
    Jung H. Yae
    Nolan C. Skelly
    Neil C. Ranly
    Phillip M. LaCasse
    Neural Computing and Applications, 2025, 37 (10) : 7449 - 7450
  • [44] CIRCUITSYNTH: Leveraging Large Language Models for Circuit Topology Synthesis
    Vijayaraghavan, Prashanth
    Shi, Luyao
    Degan, Ehsan
    Zhang, Xin
    2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,
  • [45] Large Language Models are Visual Reasoning Coordinators
    Chen, Liangyu
    Li, Bo
    Shen, Sheng
    Yang, Jingkang
    Li, Chunyuan
    Keutzer, Kurt
    Darrell, Trevor
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] Visual cognition in multimodal large language models
    Buschoff, Luca M. Schulze
    Akata, Elif
    Bethge, Matthias
    Schulz, Eric
    NATURE MACHINE INTELLIGENCE, 2025, 7 (01) : 96 - 106
  • [47] Leveraging Large Language Models for Web3D: Applications, Challenges, and Future Directions
    Tanksale, Vinayak
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 254 - 259
  • [48] GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
    Yoo, Kang Min
    Park, Dongju
    Kang, Jaewook
    Lee, Sang-Woo
    Park, Woomyeong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2225 - 2239
  • [49] NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
    Zhou, Gengze
    Hong, Yicong
    Wu, Qi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7641 - 7649
  • [50] Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning
    Shah, Dhruv
    Equi, Michael
    Osinski, Blazej
    Xia, Fei
    Ichter, Brian
    Levine, Sergey
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229