Efficient multi-level cross-modal fusion and detection network for infrared and visible image

被引：1

作者：

Gao, Hongwei ^{[1
,2
]}

Wang, Yutong ^{[1
]}

Sun, Jian ^{[1
]}

Jiang, Yueqiu ^{[1
]}

Gai, Yonggang ^{[1
]}

Yu, Jiahui ^{[3
,4
]}

机构：

[1] Shenyang Ligong Univ, Sch Automat & Elect Engn, Shenyang 110159, Peoples R China

[2] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Shenyang 110016, Peoples R China

[3] Zhejiang Univ, Dept Biomed Engn, Hangzhou 310027, Peoples R China

[4] Binjiang Inst Zhejiang Univ, Innovat Ctr Smart Med Technol & Devices, Hangzhou 310053, Peoples R China

来源：

ALEXANDRIA ENGINEERING JOURNAL | 2024年 / 108卷

关键词：

Uncrewed aerial vehicles; Aerial image; Image fusion; Object detection;

D O I：

10.1016/j.aej.2024.07.107

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

With the rapid development of uncrewed aerial vehicle (UAV) technology, detecting aerial images has found significant applications across various domains. However, existing algorithms overlook the impact of illumination on target detection, resulting in less satisfactory detection performance under low-light conditions. We propose EfficientFuseDet, a visible and infrared image fusion detection network to overcome this issue. First, an effective multilevel cross-modal fusion network called EfficientFuse is presented to combine complementary information from both modalities better. EfficientFuse captures local dependencies and global contextual information in shallow and deep layers, seamlessly combining complimentary local and global features throughout the network. The generated fused images can exhibit clear target contours and abundant texture information. Second, we propose a detection network called AFI-YOLO, which employs an inverted residual vision transformer backbone (IRViT) to effectively address the challenges associated with background interference in fused images. We design an efficient feature pyramid network (EFPN) that efficiently integrates multiscale information, enhancing multiscale detection capability using aerial images. A reparameterization decoupling head (RepHead) is proposed to further improve target classification and localization precision. Finally, experiments on the DroneVehicle dataset indicate that the detection accuracy using fused images can reach 47.2%, which is higher than that observed with visible light images of 45 %. Compared to state-of-the-art detection algorithms, EfficientFuseDet exhibits a slight decrease in speed. However, it demonstrates superior detection capabilities and effectively enhances the detection accuracy using aerial images under low-light conditions.

引用

页码：306 / 318

页数：13

共 50 条

[31] MULTI-LEVEL CONTRASTIVE LEARNING FOR HYBRID CROSS-MODAL RETRIEVAL
Zhao, Yiming
Lu, Haoyu
Zhao, Shiqi
Wu, Haoran
Lu, Zhiwu
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 6390 - 6394
[32] AGRFNet: Two-stage cross-modal and multi-level attention gated recurrent fusion network for RGB-D saliency detection
Liu, Zhengyi
Wang, Yuan
Tan, Yacheng
Li, Wei
Xiao, Yun
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 104
[33] Infrared and Visible Cross-Modal Image Retrieval Through Shared Features
Liu, Fangcen
Gao, Chenqiang
Sun, Yongqing
Zhao, Yue
Yang, Feng
Qin, Anyong
Meng, Deyu
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (11) : 4485 - 4496
[34] CMFFN: An efficient cross-modal feature fusion network for semantic
Zhang, Yingjian
Li, Ning
Jiao, Jichao
Ai, Jiawen
Yan, Zheng
Zeng, Yingchao
Zhang, Tianxiang
Li, Qian
ROBOTICS AND AUTONOMOUS SYSTEMS, 2025, 186
[35] SAM-guided multi-level collaborative Transformer for infrared and visible image fusion
Guo, Lin
Luo, Xiaoqing
Liu, Yue
Zhang, Zhancheng
Wu, Xiaojun
PATTERN RECOGNITION, 2025, 162
[36] A novel infrared and visible image fusion method based on multi-level saliency integration
Lu, Ruitao
Gao, Fan
Yang, Xiaogang
Fan, Jiwei
Li, Dalei
VISUAL COMPUTER, 2023, 39 (06): : 2321 - 2335
[37] A novel infrared and visible image fusion method based on multi-level saliency integration
Ruitao Lu
Fan Gao
Xiaogang Yang
Jiwei Fan
Dalei Li
The Visual Computer, 2023, 39 (6) : 2321 - 2335
[38] MdedFusion: A multi-level detail enhancement decomposition method for infrared and visible image fusion
Tang, Haojie
Liu, Gang
Tang, Lili
Bavirisetti, Durga Prasad
Wang, Jiebang
INFRARED PHYSICS & TECHNOLOGY, 2022, 127
[39] Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
Zhang, Qiang
Sun, Xueying
SENSORS, 2023, 23 (06)
[40] Multi-Level Cross-Modal Interactive-Network-Based Semi-Supervised Multi-Modal Ship Classification
Song, Xin
Chen, Zhikui
Zhong, Fangming
Gao, Jing
Zhang, Jianning
Li, Peng
SENSORS, 2024, 24 (22)

← 1 2 3 4 5 →