Multimodal Monocular Dense Depth Estimation with Event-Frame Fusion Using Transformer

被引:0
|
作者
Xiao, Baihui [1 ]
Xu, Jingzehua [1 ]
Zhang, Zekai [1 ]
Xing, Tianyu [1 ]
Wang, Jingjing [2 ]
Ren, Yong [3 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II | 2024年 / 15017卷
基金
中国国家自然科学基金;
关键词
Frame Camera; Event Camera; Multi-modal Fusion; Transformer self-attention; Monocular depth estimation; VISION;
D O I
10.1007/978-3-031-72335-3_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frame cameras struggle to estimate depth maps accurately under abnormal lighting conditions. In contrast, event cameras, with their high temporal resolution and high dynamic range, can capture sparse, asynchronous event streams that record pixel brightness changes, addressing the limitations of frame cameras. However, the potential of asynchronous events remains underexploited, which hinders the ability of event cameras to predict dense depth maps effectively. Integrating event streams with frame data can significantly enhance the monocular depth estimation accuracy, especially in complex scenarios. In this study, we introduce a novel depth estimation framework that combines event and frame data using a transformer-based model. Our proposed framework contains two primary components: a multimodal encoder and a joint decoder. The multimodal encoder employs self-attention mechanisms to analyze the interactions between frame patches and event tensors, mapping out dependencies across local and global spatiotemporal events. This multi-scale fusion approach maximizes the benefits of both event and frame inputs. The joint decoder incorporates a dual-phase, triple-scale feature fusion module, which extracts contextual information and delivers detailed depth prediction results. Our experimental results on the EventScape and MVSEC datasets affirm that our method sets a new benchmark in performance.
引用
收藏
页码:419 / 433
页数:15
相关论文
共 50 条
  • [41] Direct Estimation of Dense Scene Flow and Depth from a Monocular Sequence
    Mathlouthi, Yosra
    Mitiche, Amar
    Ben Ayed, Ismail
    ADVANCES IN VISUAL COMPUTING (ISVC 2014), PT 1, 2014, 8887 : 107 - 117
  • [42] Attention-Based Dense Decoding Network for Monocular Depth Estimation
    Wang, Jianrong
    Zhang, Ge
    Yu, Mei
    Xu, Tianyi
    Luo, Tao
    IEEE ACCESS, 2020, 8 (08): : 85802 - 85812
  • [43] Coarse-to-fine Planar Regularization for Dense Monocular Depth Estimation
    Liwicki, Stephan
    Zach, Christopher
    Miksik, Ondrej
    Torr, Philip H. S.
    COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 : 458 - 474
  • [44] Event-based depth estimation with dense occlusion
    Zhou, Kangrui
    Lei, Taihang
    Guan, Banglei
    Yu, Qifeng
    OPTICS LETTERS, 2024, 49 (12) : 3376 - 3379
  • [45] Monocular Depth Estimation With Multi-Scale Feature Fusion
    Xu, Xianfa
    Chen, Zhe
    Yin, Fuliang
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 678 - 682
  • [46] AFNet: Asymmetric fusion network for monocular panorama depth estimation
    Huang, Chengchao
    Shao, Feng
    Chen, Hangwei
    Mu, Baoyang
    Jiang, Qiuping
    DISPLAYS, 2024, 84
  • [47] CNNapsule: A Lightweight Network with Fusion Features for Monocular Depth Estimation
    Wang, Yinchu
    Zhu, Haijiang
    Liu, Mengze
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT I, 2021, 12891 : 507 - 518
  • [48] Monocular depth estimation with multi-scale feature fusion
    Wang Q.
    Zhang S.
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2020, 48 (05): : 7 - 12
  • [49] Radar Fusion Monocular Depth Estimation Based on Dual Attention
    Long, JianYu
    Huang, JinGui
    Wang, ShengChun
    ARTIFICIAL INTELLIGENCE AND SECURITY, ICAIS 2022, PT I, 2022, 13338 : 166 - 179
  • [50] Monocular Depth Estimation Based on Dilated Convolutions and Feature Fusion
    Li, Hang
    Liu, Shuai
    Wang, Bin
    Wu, Yuanhao
    APPLIED SCIENCES-BASEL, 2024, 14 (13):