SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

被引：0

作者：

Miao, Yang ^{[1
]}

Engelmann, Francis ^{[1
,2
]}

Vysotska, Olga ^{[1
]}

Tombari, Federico ^{[2
,3
]}

Pollefeys, Marc ^{[1
,4
]}

Barath, Daniel Bela ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Zurich, Switzerland

[2] Google, Menlo Pk, CA USA

[3] Tech Univ Munich, Munich, Germany

[4] Microsoft, Redmond, WA USA

来源：

COMPUTER VISION - ECCV 2024, PT VIII | 2025年 / 15066卷

关键词：

Coarse Localization; 3D Scene Graph; Multi-modality; PLACE RECOGNITION;

D O I：

10.1007/978-3-031-73242-3_8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce the task of localizing an input image within a multi-modal reference map represented by a collection of 3D scene graphs. These scene graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases. Given these modalities, the proposed method SceneGraphLoc learns a fixed-sized embedding for each node (i.e., representing object instances) in the scene graph, enabling effective matching with the objects visible in the input query image. This strategy significantly outperforms other cross-modal methods, even without incorporating images into the map representation. With images, SceneGraphLoc achieves performance close to that of state-of-the-art techniques depending on large image databases, while requiring three orders-of-magnitude less storage and operating orders-of-magnitude faster. Code and models are available at https://scenegraphloc.github.io.

引用

页码：127 / 150

页数：24

共 50 条

[21] SSLNet: A network for cross-modal sound source localization in visual scenes
Feng, Fan
Ming, Yue
Hu, Nannan
NEUROCOMPUTING, 2022, 500 : 1052 - 1062
[22] Temporal Cross-Modal Attention for Audio-Visual Event Localization
Nagasaki Y.
Hayashi M.
Kaneko N.
Aoki Y.
Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2022, 88 (03): : 263 - 268
[23] CPG3D: Cross-Modal Priors Guided 3D Object Reconstruction
Nie, Weizhi
Jiao, Chuanqi
Chang, Rihao
Qu, Lei
Liu, An-An
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9383 - 9396
[24] SGAligner: 3D Scene Alignment with Scene Graphs
Sarkar, Sayan Deb
Miksik, Ondrej
Pollefeys, Marc
Barath, Daniel
Armeni, Iro
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21870 - 21880
[25] Cross-modal Moment Localization in Videos
Liu, Meng
Wang, Xiang
Nie, Liqiang
Tian, Qi
Chen, Baoquan
Chua, Tat-Seng
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 843 - 851
[26] Cross-modal localization via sparsity
Kidron, Einat
Schechner, Yoav Y.
Elad, Michael
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (04) : 1390 - 1404
[27] ICCL: SELF-SUPERVISED INTRA- AND CROSS-MODAL CONTRASTIVE LEARNING WITH 2D-3D PAIRS FOR 3D SCENE UNDERSTANDING
Higa, Kyota
Yamaguchi, Masahiro
Hosoi, Toshinori
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1085 - 1089
[28] Unleash the Potential of Image Branch for Cross-modal 3D Object Detection
Zhang, Yifan
Zhang, Qijian
Hou, Junhui
Yuan, Yixuan
Xing, Guoliang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[29] 3D Hand Pose Estimation with Disentangled Cross-Modal Latent Space
Gu, Jiajun
Wang, Zhiyong
Ouyang, Wanli
Zhang, Weichen
Li, Jiafeng
Zhuo, Li
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 380 - 389
[30] FusionCraft: Fusing Emotion and Identity in Cross-Modal 3D Facial Animation
Lv, Zhenyu
Wang, Xuan
Song, Wenfeng
Hou, Xia
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT X, ICIC 2024, 2024, 14871 : 235 - 246

← 1 2 3 4 5 →