Audio Visual Language Maps for Robot Navigation

被引:1
|
作者
Huang, Chenguang [1 ]
Mees, Oier [1 ]
Zeng, Andy [2 ]
Burgard, Wolfram [3 ]
机构
[1] Univ Freiburg, Freiburg, Germany
[2] Google Res, Seattle, WA USA
[3] Univ Technol Nuremberg, Nurnberg, Germany
来源
关键词
multimodal semantic mapping; language-based navigation; open-vocabulary indexing;
D O I
10.1007/978-3-031-63596-0_10
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While interacting with the world is a multi-sensory experience, many robots continue to predominantly rely on visual perception to map and navigate in their environments. We propose AVLMaps, a 3D spatial map representation that stores cross-modal information from audio, visual, and language cues. AVLMaps fuse features from pretrained multimodal foundation models into a multi-layer representation. This enables robots to index goals in the map based on multimodal queries, such as textual descriptions, images, or audio snippets of landmarks. AVLMaps allow for zero-shot multimodal spatial goal navigation and perform better than alternatives in ambiguous scenarios. These capabilities extend to mobile robots in the real world. Videos and code are available at https://avlmaps.github.io.
引用
收藏
页码:105 / 117
页数:13
相关论文
共 50 条
  • [21] RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation
    Yang, Zeyuan
    Liu, Jiageng
    Chen, Peihao
    Cherian, Anoop
    Marks, Tim K.
    Le Roux, Jonathan
    Gan, Chuang
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16251 - 16261
  • [22] Underwater video mosaics as visual navigation maps
    Gracias, N
    Santos-Victor, J
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2000, 79 (01) : 66 - 91
  • [23] VisPod: Content-Based Audio Visual Navigation
    Zhi, Qiyu
    Lin, Suwen
    He, Shuai
    Metoyer, Ronald
    Chawla, Nitesh V.
    COMPANION OF THE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES (IUI'18), 2018,
  • [24] Audio Visual System with Cascade-Correlation Neural Network for Moving Audio Visual Robot
    Bekiarski, Alexander
    NN'09: PROCEEDINGS OF THE 10TH WSEAS INTERNATIONAL CONFERENCE ON NEURAL NETWORKS: PROCEEDINGS OF THE 10TH WSEAS INTERNATIONAL CONFERENCE ON NEURAL NETWORKS (NN'09), 2009, : 96 - 99
  • [25] Building semantic grid maps for domestic robot navigation
    Qi, Xianyu
    Wang, Wei
    Yuan, Mei
    Wang, Yuliang
    Li, Mingbo
    Xue, Lin
    Sun, Yingpin
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2020, 17 (01)
  • [26] Robot Navigation in Unseen Environments using Coarse Maps
    Xu, Chengguang
    Amato, Christopher
    Wong, Lawson L. S.
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 2932 - 2938
  • [27] Robot Navigation in Hand-Drawn Sketched Maps
    Boniardi, Federico
    Behzadian, Bahram
    Burgard, Wolfram
    Tipaldi, Gian Diego
    2015 EUROPEAN CONFERENCE ON MOBILE ROBOTS (ECMR), 2015,
  • [28] Robot Navigation in Complex Workspaces Using Harmonic Maps
    Vlantis, Panagiotis
    Vrohidis, Constantinos
    Bechlioulis, Charalampos P.
    Kyriakopoulos, Kostas J.
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 1726 - 1731
  • [29] Path Tracing on Polar Depth Maps for Robot Navigation
    Kostavelis, Ioannis
    Boukas, Evangelos
    Nalpantidis, Lazaros
    Gasteratos, Antonios
    CELLULAR AUTOMATA, ACRI 2012, 2012, 7495 : 395 - 404
  • [30] Behavior Based Rescue Robot Audio Navigation and Obstacle Avoidance
    Liu Zuojun
    Li Guangyao
    Yang Peng
    Liu Feng
    Chen Chu
    PROCEEDINGS OF THE 31ST CHINESE CONTROL CONFERENCE, 2012, : 4847 - 4851