A Contour-Aware Monocular Depth Estimation Network Using Swin Transformer and Cascaded Multiscale Fusion

被引:1
|
作者
Li, Tao [1 ]
Zhang, Yi [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
关键词
Cascaded multiscale fusion; contour aware; monocular depth estimation; Swin Transformer;
D O I
10.1109/JSEN.2024.3370821
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Depth estimation from monocular vision sensor is a fundamental problem in scene perception with wide industrial applications. Previous works tend to predict the scene depth based on high-level features obtained by convolutional neural networks (CNNs) or rely on encoder-decoder frameworks of Transformers. However, they achieved less satisfactory results, especially around object contours. In this article, we propose a Transformer-based contour-aware depth estimation module to recover the scene depth with the aid of the enhanced perception of object contours. Besides, we develop a cascaded multiscale fusion module to aggregate multilevel features, where we combine the global context with local information and refine the depth map to a higher resolution from coarse to fine. Finally, we model depth estimation as a classification problem and discretize the depth value in an adaptive way to further improve the performance of our network. Extensive experiments have been conducted on mainstream public datasets (KITTI and NYUv2) to demonstrate the effectiveness of our network, where our network exhibits superior performance against other state-of-the-art methods.
引用
收藏
页码:13620 / 13628
页数:9
相关论文
共 50 条
  • [1] SwinDepth: Unsupervised Depth Estimation using Monocular Sequences via Swin Transformer and Densely Cascaded Network
    Shim, Dongseok
    Kim, H. Jin
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 4983 - 4990
  • [2] CASCADED DETAIL-AWARE NETWORK FOR UNSUPERVISED MONOCULAR DEPTH ESTIMATION
    Ye, Xinchen
    Zhang, Mingliang
    Fan, Xin
    Xu, Rui
    Pu, Juncheng
    Yan, Ruoke
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [3] Contour-aware network for semantic segmentation via adaptive depth
    Jiang, Zhiyu
    Yuan, Yuan
    Wang, Qi
    NEUROCOMPUTING, 2018, 284 : 27 - 35
  • [4] DEPTHFORMER: MULTISCALE VISION TRANSFORMER FOR MONOCULAR DEPTH ESTIMATION WITH GLOBAL LOCAL INFORMATION FUSION
    Agarwal, Ashutosh
    Arora, Chetan
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3873 - 3877
  • [5] Lightweight monocular depth estimation using a fusion-improved transformer
    Sui, Xin
    Gao, Song
    Xu, Aigong
    Zhang, Cong
    Wang, Changqiang
    Shi, Zhengxu
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [6] DTTNet: Depth Transverse Transformer Network for Monocular Depth Estimation
    Kamath, Shreyas K. M.
    Rajeev, Srijith
    Panetta, Karen
    Agaian, Sos S.
    MULTIMODAL IMAGE EXPLOITATION AND LEARNING 2022, 2022, 12100
  • [7] Swin-Depth: Using Transformers and Multi-Scale Fusion for Monocular-Based Depth Estimation
    Cheng, Zeyu
    Zhang, Yi
    Tang, Chengkai
    IEEE SENSORS JOURNAL, 2021, 21 (23) : 26912 - 26920
  • [8] Multimodal Monocular Dense Depth Estimation with Event-Frame Fusion Using Transformer
    Xiao, Baihui
    Xu, Jingzehua
    Zhang, Zekai
    Xing, Tianyu
    Wang, Jingjing
    Ren, Yong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 419 - 433
  • [9] Residual Vision Transformer and Adaptive Fusion Autoencoders for Monocular Depth Estimation
    Yang, Wei-Jong
    Wu, Chih-Chen
    Yang, Jar-Ferr
    SENSORS, 2025, 25 (01)
  • [10] Monocular Depth Estimation Using Multi Scale Neural Network And Feature Fusion
    Sagar, Abhinav
    2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 656 - 662