eViTBins: Edge-Enhanced Vision-Transformer Bins for Monocular Depth Estimation on Edge Devices

被引:0
|
作者
She, Yutong [1 ]
Li, Peng [1 ]
Wei, Mingqiang [1 ]
Liang, Dong [1 ]
Chen, Yiping [2 ]
Xie, Haoran [3 ]
Wang, Fu Lee [4 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Sch Comp Sci & Technol, Nanjing 211106, Peoples R China
[2] Sun Yat Sen Univ, Sch Geospatial Engn & Sci, Zhuhai 519082, Peoples R China
[3] Lingnan Univ, Sch Data Sci, Hong Kong, Peoples R China
[4] Hong Kong Metropolitan Univ, Sch Sci & Technol, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Edge-enhanced vision transformer; adaptive depth bins; monocular depth estimation; edge AI; unmanned aerial vehicle; traffic monitoring;
D O I
10.1109/TITS.2024.3480114
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Monocular depth estimation (MDE) remains a fundamental yet not well-solved problem in computer vision. Current wisdom of MDE often achieves blurred or even indistinct depth boundaries, degenerating the quality of vision-based intelligent transportation systems. This paper presents an edge-enhanced vision transformer bins network for monocular depth estimation, termed eViTBins. eViTBins has three core modules to predict monocular depth maps with exceptional smoothness, accuracy, and fidelity to scene structures and object edges. First, a multi-scale feature fusion module is proposed to circumvent the loss of depth information at various levels during depth regression. Second, an image-guided edge-enhancement module is proposed to accurately infer depth values around image boundaries. Third, a vision transformer-based depth discretization module is introduced to comprehend the global depth distribution. Meanwhile, unlike most MDE models that rely on high-performance GPUs, eViTBins is optimized for seamless deployment on edge devices, such as NVIDIA Jetson Nano and Google Coral SBC, making it ideal for real-time intelligent transportation systems applications. Extensive experimental evaluations corroborate the superiority of eViTBins over competing methods, notably in terms of preserving depth edges and global depth representations.
引用
收藏
页码:20320 / 20334
页数:15
相关论文
共 50 条
  • [31] Self-supervised monocular depth estimation with occlusion mask and edge awareness
    Zhou, Shi
    Zhu, Miaomiao
    Li, Zhen
    Li, He
    Mizumachi, Mitsunori
    Zhang, Lifeng
    ARTIFICIAL LIFE AND ROBOTICS, 2021, 26 (03) : 354 - 359
  • [32] Self-supervised monocular depth estimation with occlusion mask and edge awareness
    Shi Zhou
    Miaomiao Zhu
    Zhen Li
    He Li
    Mitsunori Mizumachi
    Lifeng Zhang
    Artificial Life and Robotics, 2021, 26 : 354 - 359
  • [33] A Multi-Task Vision Transformer for Segmentation and Monocular Depth Estimation for Autonomous Vehicles
    Bavirisetti, Durga Prasad
    Martinsen, Herman Ryen
    Kiss, Gabriel Hanssen
    Lindseth, Frank
    IEEE OPEN JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 4 : 909 - 928
  • [34] DEPTHFORMER: MULTISCALE VISION TRANSFORMER FOR MONOCULAR DEPTH ESTIMATION WITH GLOBAL LOCAL INFORMATION FUSION
    Agarwal, Ashutosh
    Arora, Chetan
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3873 - 3877
  • [35] Deep Learning-Based Monocular Estimation of Distance and Height for Edge Devices
    Gasienica-Jozkowy, Jan
    Cyganek, Boguslaw
    Knapik, Mateusz
    Glogowski, Szymon
    Przebinda, Lukasz
    INFORMATION, 2024, 15 (08)
  • [36] Thickness Nanoarchitectonics with Edge-Enhanced Raman, Polarization Raman, Optoelectronic Properties of GaS Nanosheets Devices
    Zhou, Fang
    Zhao, Yujing
    Fu, Feiya
    Liu, Li
    Luo, Zhixin
    CRYSTALS, 2023, 13 (10)
  • [37] SignEdgeLVM transformer model for enhanced sign language translation on edge devices
    Damdoo, Rina
    Kumar, Praveen
    DISCOVER COMPUTING, 2025, 28 (01)
  • [38] Self-Supervised Monocular Depth Estimation: Solving the Edge-Fattening Problem
    Chen, Xingyu
    Zhang, Ruonan
    Jiang, Ji
    Wang, Yan
    Li, Ge
    Li, Thomas H.
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5765 - 5775
  • [39] ViT4Mal: Lightweight Vision Transformer for Malware Detection on Edge Devices
    Ravi, Akshara
    Chaturvedi, Vivek
    Shafique, Muhammad
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2023, 22 (05)
  • [40] Multilevel feature fusion and edge optimization network for self-supervised monocular depth estimation
    Liu, Guohua
    Niu, Shuqing
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (03)