Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

被引：1

作者：

Li, Rui ^{[1
]}

Fischer, Tobias ^{[1
]}

Segu, Mattia ^{[1
]}

Pollefeys, Marc ^{[1
]}

Van Gool, Luc ^{[1
]}

Tombari, Federico ^{[2
,3
]}

机构：

[1] Swiss Fed Inst Technol, Zurich, Switzerland

[2] Google, Mountain View, CA 94043 USA

[3] Tech Univ Munich, Munich, Germany

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年

关键词：

D O I：

10.1109/CVPR52733.2024.00940

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recovering the 3D scene geometry from a single view is a fundamental yet ill-posed problem in computer vision. While classical depth estimation methods infer only a 2.5D scene representation limited to the image plane, recent approaches based on radiance fields reconstruct a full 3D representation. However, these methods still struggle with occluded regions since inferring geometry without visual observation requires (i) semantic knowledge of the surroundings, and (ii) reasoning about spatial context. We propose KYN, a novel method for single-view scene reconstruction that reasons about semantic and spatial context to predict each point's density. We introduce a vision-language modulation module to enrich point features with fine-grained semantic information. We aggregate point representations across the scene through a language-guided spatial attention mechanism to yield per-point density predictions aware of the 3D semantic context. We show that KYN improves 3D shape recovery compared to predicting density for each 3D point in isolation. We achieve state-of-the-art results in scene and object reconstruction on KITTI-360, and show improved zero-shot generalization compared to prior work. Project page: https://ruili3.github.io/kyn.

引用

页码：9848 / 9858

页数：11

共 27 条

[1] Share with Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency
Monnier, Tom
Fisher, Matthew
Efros, Alexei A.
Aubry, Mathieu
COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 285 - 303
[2] SpaceCLIP: A Vision-Language Pretraining Framework With Spatial Reconstruction On Text
Zou, Bo
Yang, Chao
Quan, Chengbin
Zhao, Youjian
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 519 - 528
[3] AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio
Huang, Xiaoyang
Wang, Yanjun
Liu, Yang
Ni, Bingbing
Zhang, Wenjun
Liu, Jinxian
Li, Teng
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 944 - +
[4] What's "up" with vision-language models? Investigating their struggle with spatial reasoning
Kamath, Amita
Hessel, Jack
Chang, Kai-Wei
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9161 - 9175
[5] Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Ye, Shuquan
Xie, Yujia
Chen, Dongdong
Xu, Yichong
Yuan, Lu
Zhu, Chenguang
Liao, Jing
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2634 - 2645
[6] Multi-View Supervision for Single-View Reconstruction via Differentiable Ray Consistency
Tulsiani, Shubham
Zhou, Tinghui
Efros, Alexei A.
Malik, Jitendra
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8754 - 8765
[7] Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency
Tulsiani, Shubham
Zhou, Tinghui
Efros, Alexei A.
Malik, Jitendra
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 209 - 217
[8] Single-View Reconstruction via Joint Analysis of Image and Shape Collections
Huang, Qixing
Wang, Hai
Koltun, Vladlen
ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (04):
[9] Single-view 3D reconstruction via dual attention
Li, Chenghuan
Xiao, Meihua
Li, Zehuan
Chen, Fangping
Wang, Dingli
PEERJ COMPUTER SCIENCE, 2024, 10
[10] Improving Single-View Mesh Reconstruction for Unseen Categories via Primitive-Based Representation and Mesh Augmentation
Kuo, Yu-Liang
Ko, Wei-Jan
Chiu, Chen-Yi
Chiu, Wei-Chen
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 2001 - 2008

← 1 2 3 →