Instructed fine-tuning based on semantic consistency constraint for deep multi-view stereo

被引：0

作者：

Zhang, Yan ^{[1
,2
]}

Yan, Hongping ^{[1
]}

Ding, Kun ^{[2
]}

Cai, Tingting ^{[1
,2
]}

Zhou, Yueyue ^{[1
,2
]}

机构：

[1] China Univ Geosci, Sch Informat Engn, Beijing 100083, Peoples R China

[2] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artiffcial Intelligence S, Beijing 100190, Peoples R China

来源：

APPLIED INTELLIGENCE | 2025年 / 55卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Multi-view stereo; text instructions; Semantic consistency; Test-time fine-tuning; Semantic segmentation; Grounded-SAM;

D O I：

10.1007/s10489-025-06382-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing depth map-based multi-view stereo (MVS) methods typically assume that texture features remain consistent across different viewpoints. However, factors such as lighting changes, occlusions, and weakly textured regions can lead to inconsistent texture features, posing challenges for feature extraction. As a result, relying solely on texture consistency does not always yield high-quality reconstruction results in certain scenarios. In contrast, high-level semantic concepts corresponding to the same objects remain consistent across different viewpoints, which we define as semantic consistency. Since designing and training new MVS networks from scratch is both costly and labor-intensive, we propose fine-tuning existing depth map-based MVS networks during testing phase by incorporating semantic consistency constraints to improve the reconstruction quality in regions with poor results. Considering the robust open-set detection and zero-shot segmentation capabilities of Grounded-SAM, we first use Grounded-SAM to generate semantic segmentation masks for arbitrary objects in multi-view images based on text instructions. These masks are then used to fine-tune pre-trained MVS networks via aligning them from different viewpoints to the reference viewpoint and optimizing the depth maps based on the proposed semantic consistency loss function. Our method is designed as a test-time approach that is adaptable to a wide range of depth map-based MVS networks, requiring only adjustments to a small number of depth-related parameters. Comprehensive experimental evaluation across different MVS networks and large-scale scenarios demonstrates that our method effectively enhances reconstruction quality at a lower computational cost.

引用

页数：25

共 50 条

[1] MULTI-VIEW STEREO WITH SEMANTIC PRIORS
Stathopoulou, E. -K.
Remondino, F.
27TH CIPA INTERNATIONAL SYMPOSIUM: DOCUMENTING THE PAST FOR A BETTER FUTURE, 2019, 42-2 (W15): : 1135 - 1140
[2] Pyramid Multi-View Stereo with Local Consistency
Liao, Jie
Fu, Yanping
Yan, Qingan
Xiao, Chunxia
COMPUTER GRAPHICS FORUM, 2019, 38 (07) : 335 - 346
[3] Multi-view Semantic Consistency based Information Bottleneck for Clustering
Yan, Wenbiao
Zhou, Yiyang
Wang, Yifei
Zheng, Qinghai
Zhu, Jihua
KNOWLEDGE-BASED SYSTEMS, 2024, 288
[4] Multi-view stereo algorithms based on deep learning: a survey
Huang, Hongbo
Yan, Xiaoxu
Zheng, Yaolin
He, Jiayu
Xu, Longfei
Qin, Dechun
Multimedia Tools and Applications, 2025, 84 (06) : 2877 - 2908
[5] Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed Representations
Liu, Linlin
Li, Xingxuan
Thakkar, Megh
Li, Xin
Joty, Shafiq
Si, Luo
Bing, Lidong
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4799 - 4816
[6] Multi-View Image Classification With Visual, Semantic and View Consistency
Zhang, Chunjie
Cheng, Jian
Tian, Qi
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 617 - 627
[7] Deep Multi-View Stereo Gone Wild
Darmon, Francois
Bascle, Benedicte
Devaux, Jean-Clement
Monasse, Pascal
Aubry, Mathieu
2021 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2021), 2021, : 484 - 493
[8] Multi-Scale Geometric Consistency Guided Multi-View Stereo
Xu, Qingshan
Tao, Wenbing
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5478 - 5487
[9] A Multi-View Stereo Evaluation for Fine Object Reconstruction
Peat, Casey
Bachelor, Oliver
Green, Richard
2020 35TH INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2020,
[10] MVS2: Deep Unsupervised Multi-view Stereo with Multi-View Symmetry
Dai, Yuchao
Zhu, Zhidong
Rao, Zhibo
Li, Bo
2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 1 - 8

← 1 2 3 4 5 →