Efficient multi-level cross-modal fusion and detection network for infrared and visible image

被引:1
|
作者
Gao, Hongwei [1 ,2 ]
Wang, Yutong [1 ]
Sun, Jian [1 ]
Jiang, Yueqiu [1 ]
Gai, Yonggang [1 ]
Yu, Jiahui [3 ,4 ]
机构
[1] Shenyang Ligong Univ, Sch Automat & Elect Engn, Shenyang 110159, Peoples R China
[2] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Shenyang 110016, Peoples R China
[3] Zhejiang Univ, Dept Biomed Engn, Hangzhou 310027, Peoples R China
[4] Binjiang Inst Zhejiang Univ, Innovat Ctr Smart Med Technol & Devices, Hangzhou 310053, Peoples R China
关键词
Uncrewed aerial vehicles; Aerial image; Image fusion; Object detection;
D O I
10.1016/j.aej.2024.07.107
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
With the rapid development of uncrewed aerial vehicle (UAV) technology, detecting aerial images has found significant applications across various domains. However, existing algorithms overlook the impact of illumination on target detection, resulting in less satisfactory detection performance under low-light conditions. We propose EfficientFuseDet, a visible and infrared image fusion detection network to overcome this issue. First, an effective multilevel cross-modal fusion network called EfficientFuse is presented to combine complementary information from both modalities better. EfficientFuse captures local dependencies and global contextual information in shallow and deep layers, seamlessly combining complimentary local and global features throughout the network. The generated fused images can exhibit clear target contours and abundant texture information. Second, we propose a detection network called AFI-YOLO, which employs an inverted residual vision transformer backbone (IRViT) to effectively address the challenges associated with background interference in fused images. We design an efficient feature pyramid network (EFPN) that efficiently integrates multiscale information, enhancing multiscale detection capability using aerial images. A reparameterization decoupling head (RepHead) is proposed to further improve target classification and localization precision. Finally, experiments on the DroneVehicle dataset indicate that the detection accuracy using fused images can reach 47.2%, which is higher than that observed with visible light images of 45 %. Compared to state-of-the-art detection algorithms, EfficientFuseDet exhibits a slight decrease in speed. However, it demonstrates superior detection capabilities and effectively enhances the detection accuracy using aerial images under low-light conditions.
引用
收藏
页码:306 / 318
页数:13
相关论文
共 50 条
  • [41] An Efficient Network Model for Visible and Infrared Image Fusion
    Pan, Zhu
    Ouyang, Wanqi
    IEEE ACCESS, 2023, 11 : 86413 - 86430
  • [42] Cross-modal collaborative representation and multi-level supervision for crowd counting
    Li, Shufang
    Hu, Zhengping
    Zhao, Mengyao
    Bi, Shuai
    Sun, Zhe
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (03) : 601 - 608
  • [43] Cross-modal collaborative representation and multi-level supervision for crowd counting
    Shufang Li
    Zhengping Hu
    Mengyao Zhao
    Shuai Bi
    Zhe Sun
    Signal, Image and Video Processing, 2023, 17 : 601 - 608
  • [44] Speech Emotion Recognition via Multi-Level Cross-Modal Distillation
    Li, Ruichen
    Zhao, Jinming
    Jin, Qin
    INTERSPEECH 2021, 2021, : 4488 - 4492
  • [45] Infrared and visible image perceptive fusion through multi-level Gaussian curvature filtering image decomposition
    Tan, Wei
    Zhou, Huixin
    Song, Jiangluqi
    Li, Huan
    Yu, Yue
    Du, Juan
    APPLIED OPTICS, 2019, 58 (12) : 3064 - 3073
  • [46] MCFusion: infrared and visible image fusion based multiscale receptive field and cross-modal enhanced attention mechanism
    Jiang, Min
    Wang, Zhiyuan
    Kong, Jun
    Zhuang, Danfeng
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (01)
  • [47] Two-Stream Edge-Aware Network for Infrared and Visible Image Fusion With Multi-Level Wavelet Decomposition
    Wang, Haozhe
    Shu, Chang
    Li, Xiaofeng
    Fu, Yu
    Fu, Zhizhong
    Yin, Xiaofeng
    IEEE ACCESS, 2024, 12 : 22190 - 22204
  • [48] Multi-Level Multi-Modal Cross-Attention Network for Fake News Detection
    Ying, Long
    Yu, Hui
    Wang, Jinguang
    Ji, Yongze
    Qian, Shengsheng
    IEEE ACCESS, 2021, 9 : 132363 - 132373
  • [49] A Multi-Level Alignment and Cross-Modal Unified Semantic Graph Refinement Network for Conversational Emotion Recognition
    Zhang, Xiaoheng
    Cui, Weigang
    Hu, Bin
    Li, Yang
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1553 - 1566
  • [50] Multi-level cross-modal contrastive learning for review-aware recommendation
    Wei, Yibiao
    Xu, Yang
    Zhu, Lei
    Ma, Jingwei
    Peng, Chengmei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247