Audio Visual Language Maps for Robot Navigation

被引:1
|
作者
Huang, Chenguang [1 ]
Mees, Oier [1 ]
Zeng, Andy [2 ]
Burgard, Wolfram [3 ]
机构
[1] Univ Freiburg, Freiburg, Germany
[2] Google Res, Seattle, WA USA
[3] Univ Technol Nuremberg, Nurnberg, Germany
来源
关键词
multimodal semantic mapping; language-based navigation; open-vocabulary indexing;
D O I
10.1007/978-3-031-63596-0_10
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While interacting with the world is a multi-sensory experience, many robots continue to predominantly rely on visual perception to map and navigate in their environments. We propose AVLMaps, a 3D spatial map representation that stores cross-modal information from audio, visual, and language cues. AVLMaps fuse features from pretrained multimodal foundation models into a multi-layer representation. This enables robots to index goals in the map based on multimodal queries, such as textual descriptions, images, or audio snippets of landmarks. AVLMaps allow for zero-shot multimodal spatial goal navigation and perform better than alternatives in ambiguous scenarios. These capabilities extend to mobile robots in the real world. Videos and code are available at https://avlmaps.github.io.
引用
收藏
页码:105 / 117
页数:13
相关论文
共 50 条
  • [1] Visual Language Maps for Robot Navigation
    Huang, Chenguang
    Mees, Oier
    Zeng, Andy
    Burgard, Wolfram
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 10608 - 10615
  • [2] Audio-Visual Depth and Material Estimation for Robot Navigation
    Wilson, Justin
    Rewkowski, Nicholas
    Lin, Ming C.
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 9239 - 9246
  • [3] MOTION DESCRIPTION LANGUAGE-BASED TOPOLOGICAL MAPS FOR ROBOT NAVIGATION
    Martin, P.
    Egerstedt, M.
    COMMUNICATIONS IN INFORMATION AND SYSTEMS, 2008, 8 (02) : 171 - 184
  • [4] Sporadic Audio-Visual Embodied Assistive Robot Navigation For Human Tracking
    Singh, Gaurav
    Ghanem, Paul
    Padir, Taskin
    PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2023, 2023, : 99 - 105
  • [5] Schematic maps for robot navigation
    Freksa, C
    Moratz, R
    Barkowsky, T
    SPATIAL COGNITION II: INTEGRATING ABSTRACT THEORIES, EMPIRICAL STUDIES, FORMAL METHODS, AND PRACTICAL APPLICATIONS, 2000, 1849 : 100 - 114
  • [6] Robot navigation with schematic maps
    Freksa, C
    Moratz, R
    Barkowsky, T
    INTELLIGENT AUTONOMOUS SYSTEMS 6, 2000, : 809 - 816
  • [7] Visual Odometer System to Build Feature Based Maps for Mobile Robot Navigation
    Majdik, Andras L.
    Tamas, Levente
    Popa, Mircea
    Szoke, Istvan
    Lazea, Gheorghe
    18TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION, 2010, : 1200 - 1205
  • [8] Visual Navigation with Schematic Maps
    Bogen, Steffen
    Brandes, Ulrik
    Ziezold, Hendrik
    VISUAL INFORMATION COMMUNICATION, 2010, : 65 - 84
  • [9] Semantic Audio-Visual Navigation
    Chen, Changan
    Al-Halah, Ziad
    Grauman, Kristen
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15511 - 15520
  • [10] AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments
    Paul, Sudipta
    Roy-Chowdhury, Amit K.
    Cherian, Anoop
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,