Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

被引:1
|
作者
Li, Rui [1 ]
Fischer, Tobias [1 ]
Segu, Mattia [1 ]
Pollefeys, Marc [1 ]
Van Gool, Luc [1 ]
Tombari, Federico [2 ,3 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Google, Mountain View, CA 94043 USA
[3] Tech Univ Munich, Munich, Germany
关键词
D O I
10.1109/CVPR52733.2024.00940
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recovering the 3D scene geometry from a single view is a fundamental yet ill-posed problem in computer vision. While classical depth estimation methods infer only a 2.5D scene representation limited to the image plane, recent approaches based on radiance fields reconstruct a full 3D representation. However, these methods still struggle with occluded regions since inferring geometry without visual observation requires (i) semantic knowledge of the surroundings, and (ii) reasoning about spatial context. We propose KYN, a novel method for single-view scene reconstruction that reasons about semantic and spatial context to predict each point's density. We introduce a vision-language modulation module to enrich point features with fine-grained semantic information. We aggregate point representations across the scene through a language-guided spatial attention mechanism to yield per-point density predictions aware of the 3D semantic context. We show that KYN improves 3D shape recovery compared to predicting density for each 3D point in isolation. We achieve state-of-the-art results in scene and object reconstruction on KITTI-360, and show improved zero-shot generalization compared to prior work. Project page: https://ruili3.github.io/kyn.
引用
收藏
页码:9848 / 9858
页数:11
相关论文
共 27 条
  • [1] Share with Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency
    Monnier, Tom
    Fisher, Matthew
    Efros, Alexei A.
    Aubry, Mathieu
    COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 285 - 303
  • [2] SpaceCLIP: A Vision-Language Pretraining Framework With Spatial Reconstruction On Text
    Zou, Bo
    Yang, Chao
    Quan, Chengbin
    Zhao, Youjian
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 519 - 528
  • [3] AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio
    Huang, Xiaoyang
    Wang, Yanjun
    Liu, Yang
    Ni, Bingbing
    Zhang, Wenjun
    Liu, Jinxian
    Li, Teng
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 944 - +
  • [4] What's "up" with vision-language models? Investigating their struggle with spatial reasoning
    Kamath, Amita
    Hessel, Jack
    Chang, Kai-Wei
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9161 - 9175
  • [5] Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
    Ye, Shuquan
    Xie, Yujia
    Chen, Dongdong
    Xu, Yichong
    Yuan, Lu
    Zhu, Chenguang
    Liao, Jing
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2634 - 2645
  • [6] Multi-View Supervision for Single-View Reconstruction via Differentiable Ray Consistency
    Tulsiani, Shubham
    Zhou, Tinghui
    Efros, Alexei A.
    Malik, Jitendra
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8754 - 8765
  • [7] Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency
    Tulsiani, Shubham
    Zhou, Tinghui
    Efros, Alexei A.
    Malik, Jitendra
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 209 - 217
  • [8] Single-View Reconstruction via Joint Analysis of Image and Shape Collections
    Huang, Qixing
    Wang, Hai
    Koltun, Vladlen
    ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (04):
  • [9] Single-view 3D reconstruction via dual attention
    Li, Chenghuan
    Xiao, Meihua
    Li, Zehuan
    Chen, Fangping
    Wang, Dingli
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [10] Improving Single-View Mesh Reconstruction for Unseen Categories via Primitive-Based Representation and Mesh Augmentation
    Kuo, Yu-Liang
    Ko, Wei-Jan
    Chiu, Chen-Yi
    Chiu, Wei-Chen
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 2001 - 2008