SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

被引:0
|
作者
Miao, Yang [1 ]
Engelmann, Francis [1 ,2 ]
Vysotska, Olga [1 ]
Tombari, Federico [2 ,3 ]
Pollefeys, Marc [1 ,4 ]
Barath, Daniel Bela [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Google, Menlo Pk, CA USA
[3] Tech Univ Munich, Munich, Germany
[4] Microsoft, Redmond, WA USA
来源
关键词
Coarse Localization; 3D Scene Graph; Multi-modality; PLACE RECOGNITION;
D O I
10.1007/978-3-031-73242-3_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce the task of localizing an input image within a multi-modal reference map represented by a collection of 3D scene graphs. These scene graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases. Given these modalities, the proposed method SceneGraphLoc learns a fixed-sized embedding for each node (i.e., representing object instances) in the scene graph, enabling effective matching with the objects visible in the input query image. This strategy significantly outperforms other cross-modal methods, even without incorporating images into the map representation. With images, SceneGraphLoc achieves performance close to that of state-of-the-art techniques depending on large image databases, while requiring three orders-of-magnitude less storage and operating orders-of-magnitude faster. Code and models are available at https://scenegraphloc.github.io.
引用
收藏
页码:127 / 150
页数:24
相关论文
共 50 条
  • [1] Cross-Modal Center Loss for 3D Cross-Modal Retrieval
    Jing, Longlong
    Vahdani, Elahe
    Tan, Jiaxing
    Tian, Yingli
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3141 - 3150
  • [2] The effect of perceived reality of the visual scene in cross-modal interaction
    Shigemasu, Hiroaki
    PERCEPTION, 2015, 44 : 313 - 313
  • [3] Cross-modal 3D Shape Generation and Manipulation
    Cheng, Zezhou
    Chai, Menglei
    Ren, Jian
    Lee, Hsin-Ying
    Olszewski, Kyle
    Huang, Zeng
    Maji, Subhransu
    Tulyakov, Sergey
    COMPUTER VISION - ECCV 2022, PT III, 2022, 13663 : 303 - 321
  • [4] Cross-Modal Scene Networks
    Aytar, Yusuf
    Castrejon, Lluis
    Vondrick, Carl
    Pirsiavash, Hamed
    Torralba, Antonio
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (10) : 2303 - 2314
  • [5] Visual localization ability influences cross-modal bias
    Hairston, WD
    Wallace, MT
    Vaughan, JW
    Stein, BE
    Norris, JL
    Schirillo, JA
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2003, 15 (01) : 20 - 29
  • [6] CROSS-MODAL 2D-3D LOCALIZATION WITH SINGLE-MODAL QUERY
    Zhao, Zhipeng
    Yu, Huai
    Lyu, Chenwei
    Ji, Pengliang
    Yang, Xiangli
    Yang, Wen
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6171 - 6174
  • [7] Coarse-to-Fine Semantic Alignment for Cross-Modal Moment Localization
    Hu, Yupeng
    Nie, Liqiang
    Liu, Meng
    Wang, Kun
    Wang, Yinglong
    Hua, Xian-Sheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5933 - 5943
  • [8] Supervised Contrastive Learning for 3D Cross-Modal Retrieval
    Choo, Yeon-Seung
    Kim, Boeun
    Kim, Hyun-Sik
    Park, Yong-Suk
    APPLIED SCIENCES-BASEL, 2024, 14 (22):
  • [9] Cross-modal Attribute Transfer for Rescaling 3D Models
    Shao, Lin
    Chang, Angel X.
    Su, Hao
    Savva, Manolis
    Guibas, Leonidas
    PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2017, : 640 - 648
  • [10] PointAugmenting: Cross-Modal Augmentation for 3D Object Detection
    Wang, Chunwei
    Ma, Chao
    Zhu, Ming
    Yang, Xiaokang
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11789 - 11798