A Contour-Aware Monocular Depth Estimation Network Using Swin Transformer and Cascaded Multiscale Fusion

被引：1

作者：

Li, Tao ^{[1
]}

Zhang, Yi ^{[1
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

来源：

IEEE SENSORS JOURNAL | 2024年 / 24卷 / 08期

关键词：

Cascaded multiscale fusion; contour aware; monocular depth estimation; Swin Transformer;

D O I：

10.1109/JSEN.2024.3370821

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Depth estimation from monocular vision sensor is a fundamental problem in scene perception with wide industrial applications. Previous works tend to predict the scene depth based on high-level features obtained by convolutional neural networks (CNNs) or rely on encoder-decoder frameworks of Transformers. However, they achieved less satisfactory results, especially around object contours. In this article, we propose a Transformer-based contour-aware depth estimation module to recover the scene depth with the aid of the enhanced perception of object contours. Besides, we develop a cascaded multiscale fusion module to aggregate multilevel features, where we combine the global context with local information and refine the depth map to a higher resolution from coarse to fine. Finally, we model depth estimation as a classification problem and discretize the depth value in an adaptive way to further improve the performance of our network. Extensive experiments have been conducted on mainstream public datasets (KITTI and NYUv2) to demonstrate the effectiveness of our network, where our network exhibits superior performance against other state-of-the-art methods.

引用

页码：13620 / 13628

页数：9

共 50 条

[1] SwinDepth: Unsupervised Depth Estimation using Monocular Sequences via Swin Transformer and Densely Cascaded Network
Shim, Dongseok
Kim, H. Jin
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 4983 - 4990
[2] CASCADED DETAIL-AWARE NETWORK FOR UNSUPERVISED MONOCULAR DEPTH ESTIMATION
Ye, Xinchen
Zhang, Mingliang
Fan, Xin
Xu, Rui
Pu, Juncheng
Yan, Ruoke
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[3] Contour-aware network for semantic segmentation via adaptive depth
Jiang, Zhiyu
Yuan, Yuan
Wang, Qi
NEUROCOMPUTING, 2018, 284 : 27 - 35
[4] DEPTHFORMER: MULTISCALE VISION TRANSFORMER FOR MONOCULAR DEPTH ESTIMATION WITH GLOBAL LOCAL INFORMATION FUSION
Agarwal, Ashutosh
Arora, Chetan
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3873 - 3877
[5] Lightweight monocular depth estimation using a fusion-improved transformer
Sui, Xin
Gao, Song
Xu, Aigong
Zhang, Cong
Wang, Changqiang
Shi, Zhengxu
SCIENTIFIC REPORTS, 2024, 14 (01):
[6] DTTNet: Depth Transverse Transformer Network for Monocular Depth Estimation
Kamath, Shreyas K. M.
Rajeev, Srijith
Panetta, Karen
Agaian, Sos S.
MULTIMODAL IMAGE EXPLOITATION AND LEARNING 2022, 2022, 12100
[7] Swin-Depth: Using Transformers and Multi-Scale Fusion for Monocular-Based Depth Estimation
Cheng, Zeyu
Zhang, Yi
Tang, Chengkai
IEEE SENSORS JOURNAL, 2021, 21 (23) : 26912 - 26920
[8] Multimodal Monocular Dense Depth Estimation with Event-Frame Fusion Using Transformer
Xiao, Baihui
Xu, Jingzehua
Zhang, Zekai
Xing, Tianyu
Wang, Jingjing
Ren, Yong
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 419 - 433
[9] Residual Vision Transformer and Adaptive Fusion Autoencoders for Monocular Depth Estimation
Yang, Wei-Jong
Wu, Chih-Chen
Yang, Jar-Ferr
SENSORS, 2025, 25 (01)
[10] Monocular Depth Estimation Using Multi Scale Neural Network And Feature Fusion
Sagar, Abhinav
2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 656 - 662

← 1 2 3 4 5 →