SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

被引:0
|
作者
Miao, Yang [1 ]
Engelmann, Francis [1 ,2 ]
Vysotska, Olga [1 ]
Tombari, Federico [2 ,3 ]
Pollefeys, Marc [1 ,4 ]
Barath, Daniel Bela [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Google, Menlo Pk, CA USA
[3] Tech Univ Munich, Munich, Germany
[4] Microsoft, Redmond, WA USA
来源
关键词
Coarse Localization; 3D Scene Graph; Multi-modality; PLACE RECOGNITION;
D O I
10.1007/978-3-031-73242-3_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce the task of localizing an input image within a multi-modal reference map represented by a collection of 3D scene graphs. These scene graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases. Given these modalities, the proposed method SceneGraphLoc learns a fixed-sized embedding for each node (i.e., representing object instances) in the scene graph, enabling effective matching with the objects visible in the input query image. This strategy significantly outperforms other cross-modal methods, even without incorporating images into the map representation. With images, SceneGraphLoc achieves performance close to that of state-of-the-art techniques depending on large image databases, while requiring three orders-of-magnitude less storage and operating orders-of-magnitude faster. Code and models are available at https://scenegraphloc.github.io.
引用
收藏
页码:127 / 150
页数:24
相关论文
共 50 条
  • [21] SSLNet: A network for cross-modal sound source localization in visual scenes
    Feng, Fan
    Ming, Yue
    Hu, Nannan
    NEUROCOMPUTING, 2022, 500 : 1052 - 1062
  • [22] Temporal Cross-Modal Attention for Audio-Visual Event Localization
    Nagasaki Y.
    Hayashi M.
    Kaneko N.
    Aoki Y.
    Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2022, 88 (03): : 263 - 268
  • [23] CPG3D: Cross-Modal Priors Guided 3D Object Reconstruction
    Nie, Weizhi
    Jiao, Chuanqi
    Chang, Rihao
    Qu, Lei
    Liu, An-An
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9383 - 9396
  • [24] SGAligner: 3D Scene Alignment with Scene Graphs
    Sarkar, Sayan Deb
    Miksik, Ondrej
    Pollefeys, Marc
    Barath, Daniel
    Armeni, Iro
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21870 - 21880
  • [25] Cross-modal Moment Localization in Videos
    Liu, Meng
    Wang, Xiang
    Nie, Liqiang
    Tian, Qi
    Chen, Baoquan
    Chua, Tat-Seng
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 843 - 851
  • [26] Cross-modal localization via sparsity
    Kidron, Einat
    Schechner, Yoav Y.
    Elad, Michael
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (04) : 1390 - 1404
  • [27] ICCL: SELF-SUPERVISED INTRA- AND CROSS-MODAL CONTRASTIVE LEARNING WITH 2D-3D PAIRS FOR 3D SCENE UNDERSTANDING
    Higa, Kyota
    Yamaguchi, Masahiro
    Hosoi, Toshinori
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1085 - 1089
  • [28] Unleash the Potential of Image Branch for Cross-modal 3D Object Detection
    Zhang, Yifan
    Zhang, Qijian
    Hou, Junhui
    Yuan, Yixuan
    Xing, Guoliang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [29] 3D Hand Pose Estimation with Disentangled Cross-Modal Latent Space
    Gu, Jiajun
    Wang, Zhiyong
    Ouyang, Wanli
    Zhang, Weichen
    Li, Jiafeng
    Zhuo, Li
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 380 - 389
  • [30] FusionCraft: Fusing Emotion and Identity in Cross-Modal 3D Facial Animation
    Lv, Zhenyu
    Wang, Xuan
    Song, Wenfeng
    Hou, Xia
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT X, ICIC 2024, 2024, 14871 : 235 - 246