Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

被引：1

作者：

Li, Rui ^{[1
]}

Fischer, Tobias ^{[1
]}

Segu, Mattia ^{[1
]}

Pollefeys, Marc ^{[1
]}

Van Gool, Luc ^{[1
]}

Tombari, Federico ^{[2
,3
]}

机构：

[1] Swiss Fed Inst Technol, Zurich, Switzerland

[2] Google, Mountain View, CA 94043 USA

[3] Tech Univ Munich, Munich, Germany

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年

关键词：

D O I：

10.1109/CVPR52733.2024.00940

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recovering the 3D scene geometry from a single view is a fundamental yet ill-posed problem in computer vision. While classical depth estimation methods infer only a 2.5D scene representation limited to the image plane, recent approaches based on radiance fields reconstruct a full 3D representation. However, these methods still struggle with occluded regions since inferring geometry without visual observation requires (i) semantic knowledge of the surroundings, and (ii) reasoning about spatial context. We propose KYN, a novel method for single-view scene reconstruction that reasons about semantic and spatial context to predict each point's density. We introduce a vision-language modulation module to enrich point features with fine-grained semantic information. We aggregate point representations across the scene through a language-guided spatial attention mechanism to yield per-point density predictions aware of the 3D semantic context. We show that KYN improves 3D shape recovery compared to predicting density for each 3D point in isolation. We achieve state-of-the-art results in scene and object reconstruction on KITTI-360, and show improved zero-shot generalization compared to prior work. Project page: https://ruili3.github.io/kyn.

引用

页码：9848 / 9858

页数：11

共 27 条

[21] Weakly-Supervised Single-view Dense 3D Point Cloud Reconstruction via Differentiable Renderer
Peng Jin
Shaoli Liu
Jianhua Liu
Hao Huang
Linlin Yang
Michael Weinmann
Reinhard Klein
Chinese Journal of Mechanical Engineering, 2021, 34 (05) : 211 - 221
[22] Weakly-Supervised Single-view Dense 3D Point Cloud Reconstruction via Differentiable Renderer
Jin, Peng
Liu, Shaoli
Liu, Jianhua
Huang, Hao
Yang, Linlin
Weinmann, Michael
Klein, Reinhard
CHINESE JOURNAL OF MECHANICAL ENGINEERING, 2021, 34 (01)
[23] Weakly-Supervised Single-view Dense 3D Point Cloud Reconstruction via Differentiable Renderer
Peng Jin
Shaoli Liu
Jianhua Liu
Hao Huang
Linlin Yang
Michael Weinmann
Reinhard Klein
Chinese Journal of Mechanical Engineering, 2021, 34
[24] PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar
Klinghoffer, Tzofi
Xiang, Xiaoyu
Somasundaram, Siddharth
Fang, Yuchen
Richardt, Christian
Raskar, Ramesh
Ranjan, Rakesh
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14565 - 14574
[25] Rapid 3D reconstruction of constant-diameter straight pipelines via single-view perspective projection
Yao, Jiasui
Cheng, Xiaoqi
Tan, Haishu
Li, Xiaosong
Zhao, Hengxing
FRONTIERS IN PHYSICS, 2024, 12
[26] Daily Assistive View Control Learning of Low-Cost Low-Rigidity Robot via Large-Scale Vision-Language Model
Kawaharazuka, Kento
Kanazawa, Naoaki
Obinata, Yoshiki
Okada, Kei
Inaba, Masayuki
2023 IEEE-RAS 22ND INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS, HUMANOIDS, 2023,
[27] Semi-supervised single-view 3D reconstruction via multi shape prior fusion strategy and self-attention
Zhou, Wei
Shi, Xinzhe
She, Yunfeng
Liu, Kunlong
Zhang, Yongqin
COMPUTERS & GRAPHICS-UK, 2025, 126

← 1 2 3 →