Audio Visual Language Maps for Robot Navigation

被引:1
|
作者
Huang, Chenguang [1 ]
Mees, Oier [1 ]
Zeng, Andy [2 ]
Burgard, Wolfram [3 ]
机构
[1] Univ Freiburg, Freiburg, Germany
[2] Google Res, Seattle, WA USA
[3] Univ Technol Nuremberg, Nurnberg, Germany
来源
关键词
multimodal semantic mapping; language-based navigation; open-vocabulary indexing;
D O I
10.1007/978-3-031-63596-0_10
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While interacting with the world is a multi-sensory experience, many robots continue to predominantly rely on visual perception to map and navigate in their environments. We propose AVLMaps, a 3D spatial map representation that stores cross-modal information from audio, visual, and language cues. AVLMaps fuse features from pretrained multimodal foundation models into a multi-layer representation. This enables robots to index goals in the map based on multimodal queries, such as textual descriptions, images, or audio snippets of landmarks. AVLMaps allow for zero-shot multimodal spatial goal navigation and perform better than alternatives in ambiguous scenarios. These capabilities extend to mobile robots in the real world. Videos and code are available at https://avlmaps.github.io.
引用
收藏
页码:105 / 117
页数:13
相关论文
共 50 条
  • [41] A Visual Navigation Method of Substation Inspection Robot
    Liu, Weidong
    Zhangz, Shaohai
    Fanz, Shaosheng
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), VOL 1, 2016, : 148 - 153
  • [42] EXPERIMENT IN AUDIO-VISUAL LANGUAGE TEACHING
    LOVELAND, CI
    ADULT EDUCATION-LONDON, 1970, 43 (01): : 15 - 21
  • [43] On the Character Analysis in the Audio-visual Language
    Zhang, Ruirui
    Lu, Xuan
    9TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED INDUSTRIAL DESIGN & CONCEPTUAL DESIGN, VOLS 1 AND 2, 2008, : 78 - 82
  • [44] Information optimization in coupled audio-visual cortical maps
    Kardar, M
    Zee, A
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (25) : 15894 - 15897
  • [45] Audio-Visual Group Recognition Using Diffusion Maps
    Keller, Yosi
    Coifman, Ronald R.
    Lafon, Stephane
    Zucker, Steven W.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2010, 58 (01) : 403 - 413
  • [46] Fuzzy audio-visual feature maps for speaker identification
    Chibelushi, CC
    APPLICATIONS AND SCIENCE IN SOFT COMPUTING, 2004, : 317 - 322
  • [47] Building geometric certainty maps and application to the navigation of a mobile robot
    Araujo, R
    de Almeida, AT
    IECON '98 - PROCEEDINGS OF THE 24TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOLS 1-4, 1998, : 1186 - 1191
  • [48] NavTopo: Leveraging Topological Maps for Autonomous Navigation of a Mobile Robot
    Muravyev, Kirill
    Yakovlev, Konstantin
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2024, 14898 LNAI : 144 - 157
  • [49] Robot navigation via spatial and temporal coherent semantic maps
    Kostavelis, Ioannis
    Charalampous, Konstantinos
    Gasteratos, Antonios
    Tsotsos, John K.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 48 : 173 - 187
  • [50] EXTRACTING TOPOLOGICAL INFORMATION FROM GRID MAPS FOR ROBOT NAVIGATION
    Portugal, David
    Rocha, Rui P.
    ICAART: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2012, : 137 - 143