SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

被引：0

作者：

Miao, Yang ^{[1
]}

Engelmann, Francis ^{[1
,2
]}

Vysotska, Olga ^{[1
]}

Tombari, Federico ^{[2
,3
]}

Pollefeys, Marc ^{[1
,4
]}

Barath, Daniel Bela ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Zurich, Switzerland

[2] Google, Menlo Pk, CA USA

[3] Tech Univ Munich, Munich, Germany

[4] Microsoft, Redmond, WA USA

来源：

COMPUTER VISION - ECCV 2024, PT VIII | 2025年 / 15066卷

关键词：

Coarse Localization; 3D Scene Graph; Multi-modality; PLACE RECOGNITION;

D O I：

10.1007/978-3-031-73242-3_8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce the task of localizing an input image within a multi-modal reference map represented by a collection of 3D scene graphs. These scene graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases. Given these modalities, the proposed method SceneGraphLoc learns a fixed-sized embedding for each node (i.e., representing object instances) in the scene graph, enabling effective matching with the objects visible in the input query image. This strategy significantly outperforms other cross-modal methods, even without incorporating images into the map representation. With images, SceneGraphLoc achieves performance close to that of state-of-the-art techniques depending on large image databases, while requiring three orders-of-magnitude less storage and operating orders-of-magnitude faster. Code and models are available at https://scenegraphloc.github.io.

引用

页码：127 / 150

页数：24

共 50 条

[1] Cross-Modal Center Loss for 3D Cross-Modal Retrieval
Jing, Longlong
Vahdani, Elahe
Tan, Jiaxing
Tian, Yingli
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3141 - 3150
[2] The effect of perceived reality of the visual scene in cross-modal interaction
Shigemasu, Hiroaki
PERCEPTION, 2015, 44 : 313 - 313
[3] Cross-modal 3D Shape Generation and Manipulation
Cheng, Zezhou
Chai, Menglei
Ren, Jian
Lee, Hsin-Ying
Olszewski, Kyle
Huang, Zeng
Maji, Subhransu
Tulyakov, Sergey
COMPUTER VISION - ECCV 2022, PT III, 2022, 13663 : 303 - 321
[4] Cross-Modal Scene Networks
Aytar, Yusuf
Castrejon, Lluis
Vondrick, Carl
Pirsiavash, Hamed
Torralba, Antonio
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (10) : 2303 - 2314
[5] Visual localization ability influences cross-modal bias
Hairston, WD
Wallace, MT
Vaughan, JW
Stein, BE
Norris, JL
Schirillo, JA
JOURNAL OF COGNITIVE NEUROSCIENCE, 2003, 15 (01) : 20 - 29
[6] CROSS-MODAL 2D-3D LOCALIZATION WITH SINGLE-MODAL QUERY
Zhao, Zhipeng
Yu, Huai
Lyu, Chenwei
Ji, Pengliang
Yang, Xiangli
Yang, Wen
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6171 - 6174
[7] Coarse-to-Fine Semantic Alignment for Cross-Modal Moment Localization
Hu, Yupeng
Nie, Liqiang
Liu, Meng
Wang, Kun
Wang, Yinglong
Hua, Xian-Sheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5933 - 5943
[8] Supervised Contrastive Learning for 3D Cross-Modal Retrieval
Choo, Yeon-Seung
Kim, Boeun
Kim, Hyun-Sik
Park, Yong-Suk
APPLIED SCIENCES-BASEL, 2024, 14 (22):
[9] Cross-modal Attribute Transfer for Rescaling 3D Models
Shao, Lin
Chang, Angel X.
Su, Hao
Savva, Manolis
Guibas, Leonidas
PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2017, : 640 - 648
[10] PointAugmenting: Cross-Modal Augmentation for 3D Object Detection
Wang, Chunwei
Ma, Chao
Zhu, Ming
Yang, Xiaokang
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11789 - 11798

← 1 2 3 4 5 →