Audio Visual Language Maps for Robot Navigation

被引:1
|
作者
Huang, Chenguang [1 ]
Mees, Oier [1 ]
Zeng, Andy [2 ]
Burgard, Wolfram [3 ]
机构
[1] Univ Freiburg, Freiburg, Germany
[2] Google Res, Seattle, WA USA
[3] Univ Technol Nuremberg, Nurnberg, Germany
来源
关键词
multimodal semantic mapping; language-based navigation; open-vocabulary indexing;
D O I
10.1007/978-3-031-63596-0_10
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While interacting with the world is a multi-sensory experience, many robots continue to predominantly rely on visual perception to map and navigate in their environments. We propose AVLMaps, a 3D spatial map representation that stores cross-modal information from audio, visual, and language cues. AVLMaps fuse features from pretrained multimodal foundation models into a multi-layer representation. This enables robots to index goals in the map based on multimodal queries, such as textual descriptions, images, or audio snippets of landmarks. AVLMaps allow for zero-shot multimodal spatial goal navigation and perform better than alternatives in ambiguous scenarios. These capabilities extend to mobile robots in the real world. Videos and code are available at https://avlmaps.github.io.
引用
收藏
页码:105 / 117
页数:13
相关论文
共 50 条
  • [31] Humanoid robot navigation: from a visual SLAM to a visual compass
    Wirbel, Emilie
    Steux, Bruno
    Bonnabel, Silvere
    de La Fortelle, Arnaud
    2013 10TH IEEE INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2013, : 678 - 683
  • [32] An audio-visual course in the Macedonian language
    Hill, PM
    ZEITSCHRIFT FUR SLAWISTIK, 2001, 46 (04): : 487 - 488
  • [33] VISUAL NAVIGATION FOR A MOBILE ROBOT USING LANDMARKS
    JUNG, HC
    ADVANCED ROBOTICS, 1995, 9 (04) : 429 - 442
  • [34] Visual Environment Mapping for Mobile Robot Navigation
    Chang, Wen-Chung
    Ling, Huan-Chen
    2014 10TH FRANCE-JAPAN/ 8TH EUROPE-ASIA CONGRESS ON MECATRONICS (MECATRONICS), 2014, : 244 - 249
  • [35] Robot visual navigation using ceiling images
    Vladimirovich, Kim Nikolay
    Nikolaevich, Zhidkov Vladimir
    Vladimirovna, Udalova Natalia
    2020 13TH INTERNATIONAL CONFERENCE ON DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE 2020), 2020, : 140 - 145
  • [36] Fast visual mapping for mobile robot navigation
    Howard, A
    Kitchen, L
    1997 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT PROCESSING SYSTEMS, VOLS 1 & 2, 1997, : 1251 - 1255
  • [37] Visual planning for autonomous mobile robot navigation
    Marin-Hernandez, A
    Devy, M
    Ayala-Ramirez, V
    MICAI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3789 : 1001 - 1011
  • [38] Visual landmarks recognition for autonomous robot navigation
    Cicerone, M
    Stella, E
    Caponetti, L
    Distante, A
    INTELLIGENT ROBOTS AND COMPUTER VISION XVI: ALGORITHMS, TECHNIQUES, ACTIVE VISION, AND MATERIALS HANDLING, 1997, 3208 : 133 - 139
  • [39] Visual SLAM for robot navigation in healthcare facility
    Fang, Baofu
    Mei, Gaofei
    Yuan, Xiaohui
    Wang, Le
    Wang, Zaijun
    Wang, Junyang
    PATTERN RECOGNITION, 2021, 113 (113)
  • [40] A visual landmark framework for mobile robot navigation
    Hayet, J. B.
    Lerasle, F.
    Devy, M.
    IMAGE AND VISION COMPUTING, 2007, 25 (08) : 1341 - 1351