ChatNav: Leveraging LLM to Zero-Shot Semantic Reasoning in Object Navigation

被引：0

作者：

Zhu, Yong ^{[1
,2
]}

Wen, Zhenyu ^{[1
,2
]}

Li, Xiong ^{[1
,2
]}

Shi, Xiufang ^{[1
,2
]}

Wu, Xiang ^{[1
,2
]}

Dong, Hui ^{[1
,2
]}

Chen, Jiming ^{[3
,4
]}

机构：

[1] Zhejiang Univ Technol, Inst Cyberspace Secur, Hangzhou 310023, Peoples R China

[2] Zhejiang Univ Technol, Coll Informat Engn, Hangzhou 310023, Peoples R China

[3] Zhejiang Univ, Coll Control Sci & Engn, Hangzhou 310027, Peoples R China

[4] Hangzhou Dianzi Univ, Hangzhou 310018, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2025年 / 35卷 / 03期

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Semantics; Navigation; Robots; Cognition; TV; Accuracy; Chatbots; Large language models; Decision making; Pipelines; Object goal navigation; LLM; object clustering; prompt; gravity-repulsion model;

D O I：

10.1109/TCSVT.2024.3485907

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In object goal navigation tasks, the robot's understanding of semantic relationships in the environment is a key factor in its ability to localize target objects. Previously, learning-based methods trained robots using 3D scene datasets to learn semantic relationships. However, these approaches perform poorly in new environments with unfamiliar semantic contexts. In this paper, we propose ChatNav which leverages the powerful knowledge summarizing and reasoning capabilities of a Large Language Model (LLM) for zero-shot inference of explicit semantic relationships. These relationships are further integrated into the navigation system for efficient localization of target objects. ChatNav employs a spatial object clustering algorithm to collect semantic clues and designs common-sense-based prompts for interacting with LLM. It then uses a gravity-repulsion model to convert inference results into heuristic factors for robust navigation decision-making. Our approach requires no additional training and can consistently obtain accurate semantic relationships from LLM, making it well-suited for navigating unknown environments. Experimental results demonstrate the outstanding navigation performance of our proposed method on the Gibson and HM3D datasets, surpassing the current state-of-the-art object goal navigation methods.

引用

页码：2369 / 2381

页数：13

共 50 条

[1] Semantic Policy Network for Zero-Shot Object Goal Visual Navigation
Zhao, Qianfan
Zhang, Lu
He, Bin
Liu, Zhiyong
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (11) : 7655 - 7662
[2] Zero-Shot Object Goal Visual Navigation
Zhao, Qianfan
Zhang, Lu
He, Bin
Qiao, Hong
Liu, Zhiyong
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 2025 - 2031
[3] Prioritized Semantic Learning for Zero-Shot Instance Navigation
Sun, Xinyu
Liu, Lizhao
Zhi, Hongyan
Qiu, Ronghe
Liang, Junwei
COMPUTER VISION - ECCV 2024, PT XII, 2025, 15070 : 161 - 178
[4] TriHelper: Zero-Shot Object Navigation with Dynamic Assistance
Zhang, Lingfeng
Zhang, Qiang
Wang, Hao
Xiao, Erjia
Jiang, Zixuan
Chen, Honglei
Xu, Renjing
2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2024), 2024, : 10035 - 10042
[5] Zero-Shot Object Recognition by Semantic Manifold Distance
Fu, Zhenyong
Xiang, Tao
Kodirov, Elyor
Gong, Shaogang
2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2635 - 2644
[6] Leveraging Balanced Semantic Embedding for Generative Zero-Shot Learning
Xie, Guo-Sen
Zhang, Xu-Yao
Xiang, Tian-Zhu
Zhao, Fang
Zhang, Zheng
Shao, Ling
Li, Xuelong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) : 9575 - 9582
[7] Zero-shot object detection with contrastive semantic association network
Haohe Li
Chong Wang
Weijie Liu
Yilin Gong
Xinmiao Dai
Applied Intelligence, 2023, 53 : 30056 - 30068
[8] Zero-Shot Object Recognition Using Semantic Label Vectors
Naha, Shujon
Wang, Yang
2015 12TH CONFERENCE ON COMPUTER AND ROBOT VISION CRV 2015, 2015, : 94 - 100
[9] Zero-shot Object Prediction using Semantic Scene Knowledge
Grzeszick, Rene
Fink, Gernot A.
PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2017), VOL 5, 2017, : 120 - 129
[10] A dynamic semantic knowledge graph for zero-shot object detection
Wen Lv
Hongbo Shi
Shuai Tan
Bing Song
Yang Tao
The Visual Computer, 2023, 39 : 4513 - 4527

← 1 2 3 4 5 →