Multimodal Monocular Dense Depth Estimation with Event-Frame Fusion Using Transformer

被引:0
|
作者
Xiao, Baihui [1 ]
Xu, Jingzehua [1 ]
Zhang, Zekai [1 ]
Xing, Tianyu [1 ]
Wang, Jingjing [2 ]
Ren, Yong [3 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II | 2024年 / 15017卷
基金
中国国家自然科学基金;
关键词
Frame Camera; Event Camera; Multi-modal Fusion; Transformer self-attention; Monocular depth estimation; VISION;
D O I
10.1007/978-3-031-72335-3_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frame cameras struggle to estimate depth maps accurately under abnormal lighting conditions. In contrast, event cameras, with their high temporal resolution and high dynamic range, can capture sparse, asynchronous event streams that record pixel brightness changes, addressing the limitations of frame cameras. However, the potential of asynchronous events remains underexploited, which hinders the ability of event cameras to predict dense depth maps effectively. Integrating event streams with frame data can significantly enhance the monocular depth estimation accuracy, especially in complex scenarios. In this study, we introduce a novel depth estimation framework that combines event and frame data using a transformer-based model. Our proposed framework contains two primary components: a multimodal encoder and a joint decoder. The multimodal encoder employs self-attention mechanisms to analyze the interactions between frame patches and event tensors, mapping out dependencies across local and global spatiotemporal events. This multi-scale fusion approach maximizes the benefits of both event and frame inputs. The joint decoder incorporates a dual-phase, triple-scale feature fusion module, which extracts contextual information and delivers detailed depth prediction results. Our experimental results on the EventScape and MVSEC datasets affirm that our method sets a new benchmark in performance.
引用
收藏
页码:419 / 433
页数:15
相关论文
共 50 条
  • [21] Efficient Unsupervised Monocular Depth Estimation with Inter-Frame Depth Interpolation
    Zhang, Min
    Li, Jianhua
    IMAGE AND GRAPHICS (ICIG 2021), PT III, 2021, 12890 : 729 - 741
  • [22] MobileDepth: Monocular Depth Estimation Based on Lightweight Vision Transformer
    Li, Yundong
    Wei, Xiaokun
    APPLIED ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [23] METER: A Mobile Vision Transformer Architecture for Monocular Depth Estimation
    Papa, Lorenzo
    Russo, Paolo
    Amerini, Irene
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5882 - 5893
  • [24] Edge-Aware Monocular Dense Depth Estimation with Morphology
    Li, Zhi
    Zhu, Xiaoyang
    Yu, Haitao
    Zhang, Qi
    Jiang, Yongshi
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2935 - 2942
  • [25] Monocular Depth Estimation by Two-Frame Triangulation using Flat Surface Constraints
    Kaneko, Alex M.
    Yamamoto, Kenjiro
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 574 - 581
  • [26] Monocular Depth Estimation Using Multi Scale Neural Network And Feature Fusion
    Sagar, Abhinav
    2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 656 - 662
  • [27] MODE: Monocular omnidirectional depth estimation via consistent depth fusion
    Liu, Yunbiao
    Chen, Chunyi
    IMAGE AND VISION COMPUTING, 2023, 136
  • [28] Robust Multimodal Depth Estimation using Transformer based Generative Adversarial Networks
    Khan, Md Fahim Faysal
    Devulapally, Anusha
    Advani, Siddharth
    Narayanan, Vijaykrishnan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3559 - 3568
  • [29] Adaptive Baseline Monocular Dense Mapping with Inter-frame Depth Propagation
    Wang, Kaixuan
    Shen, Shaojie
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 3225 - 3232
  • [30] Monocular Depth Estimation Using Relative Depth Maps
    Lee, Jae-Han
    Kim, Chang-Su
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9721 - 9730