Instructed fine-tuning based on semantic consistency constraint for deep multi-view stereo

被引:0
|
作者
Zhang, Yan [1 ,2 ]
Yan, Hongping [1 ]
Ding, Kun [2 ]
Cai, Tingting [1 ,2 ]
Zhou, Yueyue [1 ,2 ]
机构
[1] China Univ Geosci, Sch Informat Engn, Beijing 100083, Peoples R China
[2] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artiffcial Intelligence S, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-view stereo; text instructions; Semantic consistency; Test-time fine-tuning; Semantic segmentation; Grounded-SAM;
D O I
10.1007/s10489-025-06382-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing depth map-based multi-view stereo (MVS) methods typically assume that texture features remain consistent across different viewpoints. However, factors such as lighting changes, occlusions, and weakly textured regions can lead to inconsistent texture features, posing challenges for feature extraction. As a result, relying solely on texture consistency does not always yield high-quality reconstruction results in certain scenarios. In contrast, high-level semantic concepts corresponding to the same objects remain consistent across different viewpoints, which we define as semantic consistency. Since designing and training new MVS networks from scratch is both costly and labor-intensive, we propose fine-tuning existing depth map-based MVS networks during testing phase by incorporating semantic consistency constraints to improve the reconstruction quality in regions with poor results. Considering the robust open-set detection and zero-shot segmentation capabilities of Grounded-SAM, we first use Grounded-SAM to generate semantic segmentation masks for arbitrary objects in multi-view images based on text instructions. These masks are then used to fine-tune pre-trained MVS networks via aligning them from different viewpoints to the reference viewpoint and optimizing the depth maps based on the proposed semantic consistency loss function. Our method is designed as a test-time approach that is adaptable to a wide range of depth map-based MVS networks, requiring only adjustments to a small number of depth-related parameters. Comprehensive experimental evaluation across different MVS networks and large-scale scenarios demonstrates that our method effectively enhances reconstruction quality at a lower computational cost.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] MULTI-VIEW STEREO WITH SEMANTIC PRIORS
    Stathopoulou, E. -K.
    Remondino, F.
    27TH CIPA INTERNATIONAL SYMPOSIUM: DOCUMENTING THE PAST FOR A BETTER FUTURE, 2019, 42-2 (W15): : 1135 - 1140
  • [2] Pyramid Multi-View Stereo with Local Consistency
    Liao, Jie
    Fu, Yanping
    Yan, Qingan
    Xiao, Chunxia
    COMPUTER GRAPHICS FORUM, 2019, 38 (07) : 335 - 346
  • [3] Multi-view Semantic Consistency based Information Bottleneck for Clustering
    Yan, Wenbiao
    Zhou, Yiyang
    Wang, Yifei
    Zheng, Qinghai
    Zhu, Jihua
    KNOWLEDGE-BASED SYSTEMS, 2024, 288
  • [4] Multi-view stereo algorithms based on deep learning: a survey
    Huang, Hongbo
    Yan, Xiaoxu
    Zheng, Yaolin
    He, Jiayu
    Xu, Longfei
    Qin, Dechun
    Multimedia Tools and Applications, 2025, 84 (06) : 2877 - 2908
  • [5] Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed Representations
    Liu, Linlin
    Li, Xingxuan
    Thakkar, Megh
    Li, Xin
    Joty, Shafiq
    Si, Luo
    Bing, Lidong
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4799 - 4816
  • [6] Multi-View Image Classification With Visual, Semantic and View Consistency
    Zhang, Chunjie
    Cheng, Jian
    Tian, Qi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 617 - 627
  • [7] Deep Multi-View Stereo Gone Wild
    Darmon, Francois
    Bascle, Benedicte
    Devaux, Jean-Clement
    Monasse, Pascal
    Aubry, Mathieu
    2021 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2021), 2021, : 484 - 493
  • [8] Multi-Scale Geometric Consistency Guided Multi-View Stereo
    Xu, Qingshan
    Tao, Wenbing
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5478 - 5487
  • [9] A Multi-View Stereo Evaluation for Fine Object Reconstruction
    Peat, Casey
    Bachelor, Oliver
    Green, Richard
    2020 35TH INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2020,
  • [10] MVS2: Deep Unsupervised Multi-view Stereo with Multi-View Symmetry
    Dai, Yuchao
    Zhu, Zhidong
    Rao, Zhibo
    Li, Bo
    2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 1 - 8