SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

被引：0

作者：

Miao, Yang ^{[1
]}

Engelmann, Francis ^{[1
,2
]}

Vysotska, Olga ^{[1
]}

Tombari, Federico ^{[2
,3
]}

Pollefeys, Marc ^{[1
,4
]}

Barath, Daniel Bela ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Zurich, Switzerland

[2] Google, Menlo Pk, CA USA

[3] Tech Univ Munich, Munich, Germany

[4] Microsoft, Redmond, WA USA

来源：

COMPUTER VISION - ECCV 2024, PT VIII | 2025年 / 15066卷

关键词：

Coarse Localization; 3D Scene Graph; Multi-modality; PLACE RECOGNITION;

D O I：

10.1007/978-3-031-73242-3_8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce the task of localizing an input image within a multi-modal reference map represented by a collection of 3D scene graphs. These scene graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases. Given these modalities, the proposed method SceneGraphLoc learns a fixed-sized embedding for each node (i.e., representing object instances) in the scene graph, enabling effective matching with the objects visible in the input query image. This strategy significantly outperforms other cross-modal methods, even without incorporating images into the map representation. With images, SceneGraphLoc achieves performance close to that of state-of-the-art techniques depending on large image databases, while requiring three orders-of-magnitude less storage and operating orders-of-magnitude faster. Code and models are available at https://scenegraphloc.github.io.

引用

页码：127 / 150

页数：24

共 50 条

[41] Musical versus visual graphs: Cross-modal equivalence in perception of time series data
Flowers, JH
Hauer, TA
HUMAN FACTORS, 1995, 37 (03) : 553 - 569
[42] Musical versus visual graphs: cross-modal equivalence in perception of time series data
Hum Factors, 3 (553):
[43] Trusted 3D self-supervised representation learning with cross-modal settings
Han, Xu
Cheng, Haozhe
Shi, Pengcheng
Zhu, Jihua
MACHINE VISION AND APPLICATIONS, 2024, 35 (04)
[44] CROSS-MODAL GUIDANCE NETWORK FOR SKETCH-BASED 3D SHAPE RETRIEVAL
Dai, Weidong
Liang, Shuang
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[45] Monocular 3D Target Detection Based on Cross-Modal and Mass Perceived Loss
Chen, Jingang
Wang, Fengsui
Liu, Furong
Wang, Qisheng
BIOMETRIC RECOGNITION (CCBR 2021), 2021, 12878 : 410 - 418
[46] Cross-Modal 3D Shape Retrieval via Heterogeneous Dynamic Graph Representation
Dai, Yue
Feng, Yifan
Ma, Nan
Zhao, Xibin
Gao, Yue
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 2370 - 2387
[47] Cross-modal knowledge transfer for 3D point clouds via graph offset
Zhang, Huang
Yu, Long
Wang, Guoqi
Tian, Shengwei
Yu, Zaiyang
Li, Weijun
Ning, Xin
PATTERN RECOGNITION, 2025, 162
[48] Cross-Modal Localization Through Mutual Information
Alempijevic, Alen
Kodagoda, Sarath
Dissanayake, Gamini
2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 5597 - 5602
[49] BCAF-3D: Bilateral Content Awareness Fusion for cross-modal 3D object detection
Chen, Mu
Liu, Pengfei
Zhao, Huaici
KNOWLEDGE-BASED SYSTEMS, 2023, 279
[50] Cross-Modal and Cross-Domain Knowledge Transfer for Label-Free 3D Segmentation
Zhang, Jingyu
Yang, Huitong
Wu, Dai-Jie
Keung, Jacky
Li, Xuesong
Zhu, Xinge
Ma, Yuexin
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 465 - 477

← 1 2 3 4 5 →