SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

被引:0
|
作者
Miao, Yang [1 ]
Engelmann, Francis [1 ,2 ]
Vysotska, Olga [1 ]
Tombari, Federico [2 ,3 ]
Pollefeys, Marc [1 ,4 ]
Barath, Daniel Bela [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Google, Menlo Pk, CA USA
[3] Tech Univ Munich, Munich, Germany
[4] Microsoft, Redmond, WA USA
来源
COMPUTER VISION - ECCV 2024, PT VIII | 2025年 / 15066卷
关键词
Coarse Localization; 3D Scene Graph; Multi-modality; PLACE RECOGNITION;
D O I
10.1007/978-3-031-73242-3_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce the task of localizing an input image within a multi-modal reference map represented by a collection of 3D scene graphs. These scene graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases. Given these modalities, the proposed method SceneGraphLoc learns a fixed-sized embedding for each node (i.e., representing object instances) in the scene graph, enabling effective matching with the objects visible in the input query image. This strategy significantly outperforms other cross-modal methods, even without incorporating images into the map representation. With images, SceneGraphLoc achieves performance close to that of state-of-the-art techniques depending on large image databases, while requiring three orders-of-magnitude less storage and operating orders-of-magnitude faster. Code and models are available at https://scenegraphloc.github.io.
引用
收藏
页码:127 / 150
页数:24
相关论文
共 50 条
  • [41] Musical versus visual graphs: Cross-modal equivalence in perception of time series data
    Flowers, JH
    Hauer, TA
    HUMAN FACTORS, 1995, 37 (03) : 553 - 569
  • [43] Trusted 3D self-supervised representation learning with cross-modal settings
    Han, Xu
    Cheng, Haozhe
    Shi, Pengcheng
    Zhu, Jihua
    MACHINE VISION AND APPLICATIONS, 2024, 35 (04)
  • [44] CROSS-MODAL GUIDANCE NETWORK FOR SKETCH-BASED 3D SHAPE RETRIEVAL
    Dai, Weidong
    Liang, Shuang
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [45] Monocular 3D Target Detection Based on Cross-Modal and Mass Perceived Loss
    Chen, Jingang
    Wang, Fengsui
    Liu, Furong
    Wang, Qisheng
    BIOMETRIC RECOGNITION (CCBR 2021), 2021, 12878 : 410 - 418
  • [46] Cross-Modal 3D Shape Retrieval via Heterogeneous Dynamic Graph Representation
    Dai, Yue
    Feng, Yifan
    Ma, Nan
    Zhao, Xibin
    Gao, Yue
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 2370 - 2387
  • [47] Cross-modal knowledge transfer for 3D point clouds via graph offset
    Zhang, Huang
    Yu, Long
    Wang, Guoqi
    Tian, Shengwei
    Yu, Zaiyang
    Li, Weijun
    Ning, Xin
    PATTERN RECOGNITION, 2025, 162
  • [48] Cross-Modal Localization Through Mutual Information
    Alempijevic, Alen
    Kodagoda, Sarath
    Dissanayake, Gamini
    2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 5597 - 5602
  • [49] BCAF-3D: Bilateral Content Awareness Fusion for cross-modal 3D object detection
    Chen, Mu
    Liu, Pengfei
    Zhao, Huaici
    KNOWLEDGE-BASED SYSTEMS, 2023, 279
  • [50] Cross-Modal and Cross-Domain Knowledge Transfer for Label-Free 3D Segmentation
    Zhang, Jingyu
    Yang, Huitong
    Wu, Dai-Jie
    Keung, Jacky
    Li, Xuesong
    Zhu, Xinge
    Ma, Yuexin
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 465 - 477