Audio Visual Language Maps for Robot Navigation

被引：1

作者：

Huang, Chenguang ^{[1
]}

Mees, Oier ^{[1
]}

Zeng, Andy ^{[2
]}

Burgard, Wolfram ^{[3
]}

机构：

[1] Univ Freiburg, Freiburg, Germany

[2] Google Res, Seattle, WA USA

[3] Univ Technol Nuremberg, Nurnberg, Germany

来源：

EXPERIMENTAL ROBOTICS, ISER 2023 | 2024年 / 30卷

关键词：

multimodal semantic mapping; language-based navigation; open-vocabulary indexing;

D O I：

10.1007/978-3-031-63596-0_10

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

While interacting with the world is a multi-sensory experience, many robots continue to predominantly rely on visual perception to map and navigate in their environments. We propose AVLMaps, a 3D spatial map representation that stores cross-modal information from audio, visual, and language cues. AVLMaps fuse features from pretrained multimodal foundation models into a multi-layer representation. This enables robots to index goals in the map based on multimodal queries, such as textual descriptions, images, or audio snippets of landmarks. AVLMaps allow for zero-shot multimodal spatial goal navigation and perform better than alternatives in ambiguous scenarios. These capabilities extend to mobile robots in the real world. Videos and code are available at https://avlmaps.github.io.

引用

页码：105 / 117

页数：13

共 50 条

[41] A Visual Navigation Method of Substation Inspection Robot
Liu, Weidong
Zhangz, Shaohai
Fanz, Shaosheng
PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), VOL 1, 2016, : 148 - 153
[42] EXPERIMENT IN AUDIO-VISUAL LANGUAGE TEACHING
LOVELAND, CI
ADULT EDUCATION-LONDON, 1970, 43 (01): : 15 - 21
[43] On the Character Analysis in the Audio-visual Language
Zhang, Ruirui
Lu, Xuan
9TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED INDUSTRIAL DESIGN & CONCEPTUAL DESIGN, VOLS 1 AND 2, 2008, : 78 - 82
[44] Information optimization in coupled audio-visual cortical maps
Kardar, M
Zee, A
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (25) : 15894 - 15897
[45] Audio-Visual Group Recognition Using Diffusion Maps
Keller, Yosi
Coifman, Ronald R.
Lafon, Stephane
Zucker, Steven W.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2010, 58 (01) : 403 - 413
[46] Fuzzy audio-visual feature maps for speaker identification
Chibelushi, CC
APPLICATIONS AND SCIENCE IN SOFT COMPUTING, 2004, : 317 - 322
[47] Building geometric certainty maps and application to the navigation of a mobile robot
Araujo, R
de Almeida, AT
IECON '98 - PROCEEDINGS OF THE 24TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOLS 1-4, 1998, : 1186 - 1191
[48] NavTopo: Leveraging Topological Maps for Autonomous Navigation of a Mobile Robot
Muravyev, Kirill
Yakovlev, Konstantin
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2024, 14898 LNAI : 144 - 157
[49] Robot navigation via spatial and temporal coherent semantic maps
Kostavelis, Ioannis
Charalampous, Konstantinos
Gasteratos, Antonios
Tsotsos, John K.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 48 : 173 - 187
[50] EXTRACTING TOPOLOGICAL INFORMATION FROM GRID MAPS FOR ROBOT NAVIGATION
Portugal, David
Rocha, Rui P.
ICAART: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2012, : 137 - 143

← 1 2 3 4 5 →