Self-Supervised Pretraining With Monocular Height Estimation for Semantic Segmentation

被引：1

作者：

Xiong, Zhitong ^{[1
]}

Chen, Sining ^{[1
]}

Shi, Yilei ^{[2
]}

Zhu, Xiao Xiang ^{[1
,3
]}

机构：

[1] Tech Univ Munich TUM, Chair Data Sci Earth Observat, D-80333 Munich, Germany

[2] Tech Univ Munich TUM, Sch Engn & Design, D-80333 Munich, Germany

[3] Munich Ctr Machine Learning, Chair Data Sci Earth Observat, D-80333 Munich, Germany

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

关键词：

Semantics; Task analysis; Estimation; Neurons; Semantic segmentation; Data models; Buildings; Foundation models; interpretable deep learning; monocular height estimation (MHE); self-supervised pretraining;

D O I：

10.1109/TGRS.2024.3412629

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Monocular height estimation (MHE) is key for generating 3-D city models, essential for swift disaster response. Moving beyond the traditional focus on performance enhancement, our study breaks new ground by probing the interpretability of MHE networks. We have pioneeringly discovered that neurons within MHE models demonstrate selectivity for both height and semantic classes. This insight sheds light on the complex inner workings of MHE models and inspires innovative strategies for leveraging elevation data more effectively. Informed by this insight, we propose a pioneering framework that employs MHE as a self-supervised pretraining method for remote sensing (RS) imagery. This approach significantly enhances the performance of semantic segmentation tasks. Furthermore, we develop a disentangled latent transformer (DLT) module that leverages explainable deep representations from pretrained MHE networks for unsupervised semantic segmentation. Our method demonstrates the significant potential of MHE tasks in developing foundation models for sophisticated pixel-level semantic analyses. Additionally, we present a new dataset designed to benchmark the performance of both semantic segmentation and height estimation tasks. The dataset and code will be publicly available at https://github.com/zhu-xlab/DLT-MHE.pytorch.

引用

页数：12

共 50 条

[31] Self-Supervised Monocular Depth Estimation With Multiscale Perception
Zhang, Yourun
Gong, Maoguo
Li, Jianzhao
Zhang, Mingyang
Jiang, Fenlong
Zhao, Hongyu
IEEE Transactions on Image Processing, 2022, 31 : 3251 - 3266
[32] Self-Supervised Monocular Depth Estimation With Multiscale Perception
Zhang, Yourun
Gong, Maoguo
Li, Jianzhao
Zhang, Mingyang
Jiang, Fenlong
Zhao, Hongyu
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3251 - 3266
[33] Semantic and Optical Flow Guided Self-supervised Monocular Depth and Ego-Motion Estimation
Fang, Jiaojiao
Liu, Guizhong
IMAGE AND GRAPHICS (ICIG 2021), PT III, 2021, 12890 : 465 - 477
[34] Depth estimation of supervised monocular images based on semantic segmentation
Wang, Qi
Piao, Yan
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 90
[35] Self-supervised Monocular Depth Estimation Based on Semantic Assistance and Depth Temporal Consistency Constraints
Ling, Chuanwu
Chen, Hua
Xu, Dayong
Zhang, Xiaogang
Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2024, 51 (08): : 1 - 12
[36] Dual-attention-based semantic-aware self-supervised monocular depth estimation
Xu, Jinze
Ye, Feng
Lai, Yizong
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (24) : 65579 - 65601
[37] Self-Supervised Model Adaptation for Multimodal Semantic Segmentation
Valada, Abhinav
Mohan, Rohit
Burgard, Wolfram
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (05) : 1239 - 1285
[38] Self-supervised Semantic Segmentation: Consistency over Transformation
Karimijafarbigloo, Sanaz
Azad, Reza
Kazerouni, Amirhossein
Velichko, Yury
Bagci, Ulas
Merhof, Dorit
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2646 - 2655
[39] Self-supervised Augmentation Consistency for Adapting Semantic Segmentation
Araslanov, Nikita
Roth, Stefan
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15379 - 15389
[40] Self-supervised contrastive representation learning for semantic segmentation
Liu B.
Cai H.
Wang Y.
Chen X.
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2024, 51 (01): : 125 - 134

← 1 2 3 4 5 →