L3MVN: Leveraging Large Language Models for Visual Target Navigation

被引:10
|
作者
Yu, Bangguo [1 ]
Kasaei, Hamidreza [1 ]
Cao, Ming [1 ]
机构
[1] Univ Groningen, Fac Sci & Engn, Ne, NL-9747 AG Groningen, Netherlands
关键词
D O I
10.1109/IROS55552.2023.10342512
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual target navigation in unknown environments is a crucial problem in robotics. Despite extensive investigation of classical and learning-based approaches in the past, robots lack common-sense knowledge about household objects and layouts. Prior state-of-the-art approaches to this task rely on learning the priors during the training and typically require significant expensive resources and time for learning. To address this, we propose a new framework for visual target navigation that leverages Large Language Models (LLM) to impart common sense for object searching. Specifically, we introduce two paradigms: (i) zero-shot and (ii) feed-forward approaches that use language to find the relevant frontier from the semantic map as a long-term goal and explore the environment efficiently. Our analyse demonstrates the notable zero-shot generalization and transfer capabilities from the use of language. Experiments on Gibson and Habitat-Matterport 3D (HM3D) demonstrate that the proposed framework significantly outperforms existing map-based methods in terms of success rate and generalization. Ablation analyse also indicates that the common-sense knowledge from the language model leads to more efficient semantic exploration. Finally, we provide a real robot experiment to verify the applicability of our framework in real-world scenarios. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/l3mvn.
引用
收藏
页码:3554 / 3560
页数:7
相关论文
共 50 条
  • [1] Leveraging Large Language Models for Effective Organizational Navigation
    Chandrasekar, Haresh
    Gupta, Srishti
    Liu, Chun-Tzu
    Tsai, Chun-Hua
    PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH, DGO 2024, 2024, : 1020 - 1022
  • [2] Leveraging large language models for autonomous robotic mapping and navigation
    Espada, Jordan Pascual
    Qiu, Sofia Yiyu
    Crespo, Ruben Gonzalez
    Carus, Juan Luis
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2025, 22 (02):
  • [3] Enhancing Large Language Models with RAG for Visual Language Navigation in Continuous Environments
    Bao, Xiaoan
    Lv, Zhiqiang
    Wu, Biao
    ELECTRONICS, 2025, 14 (05):
  • [4] Leveraging large language models in dermatology
    Matin, Rubeta N.
    Linos, Eleni
    Rajan, Neil
    BRITISH JOURNAL OF DERMATOLOGY, 2023, 189 (03) : 253 - 254
  • [5] Viewpoint Estimation for Visual Target Navigation by Leveraging Keypoint Detection
    Choi, Yunho
    Kim, Nuri
    Park, Jeongho
    Oh, Songhwai
    2020 20TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2020, : 1162 - 1165
  • [6] Leveraging large language models for predictive chemistry
    Kevin Maik Jablonka
    Philippe Schwaller
    Andres Ortega-Guerrero
    Berend Smit
    Nature Machine Intelligence, 2024, 6 : 161 - 169
  • [7] Leveraging Large Language Models for Tradespace Exploration
    Apaza, Gabriel
    Selva, Daniel
    JOURNAL OF SPACECRAFT AND ROCKETS, 2024, 61 (05) : 1165 - 1183
  • [8] Leveraging Large Language Models for Sequential Recommendation
    Harte, Jesse
    Zorgdrager, Wouter
    Louridas, Panos
    Katsifodimos, Asterios
    Jannach, Dietmar
    Fragkoulis, Marios
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 1096 - 1102
  • [9] Leveraging large language models for predictive chemistry
    Jablonka, Kevin Maik
    Schwaller, Philippe
    Ortega-Guerrero, Andres
    Smit, Berend
    NATURE MACHINE INTELLIGENCE, 2024, 6 (02) : 122 - 123
  • [10] Leveraging Large Language Models for Automated Dialogue Analysis
    Finch, Sarah E.
    Paek, Ellie S.
    Choi, Jinho D.
    24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 202 - 215