MCAFNet: Multiscale cross-modality adaptive fusion network for multispectral object detection

被引:0
|
作者
Zheng, Shangpo [1 ]
Liu, Junfeng [1 ]
Jun, Zeng [2 ]
机构
[1] South China Univ Technol Sci & Engn, Sch Automat Sci & Engn, Guangzhou 510641, Peoples R China
[2] South China Univ Technol, Sch Elect Power Engn, Guangzhou 510641, Peoples R China
基金
中国国家自然科学基金;
关键词
Attention mechanism; cross-modality; multimodal adaptive feature fusion; multispectral object detection; transformer; PEDESTRIAN DETECTION;
D O I
10.1016/j.dsp.2025.104996
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multispectral object detection techniques integrate data from various spectral modalities, such as combining thermal images with RGB visible light images, to enhance the precision a-nd robustness of object detection under diverse environmental c-onditions. Although this approach has improved detection capab-ilities, significant challenges remain in fully leveraging the specif-ic detail information of each single modality and accurately capturing cross-modality shared features information. To address th-ese challenges, we propose a Multiscale Cross- modality Adaptive Fusion Network (MCAFNet). This network incorporates Cross- modality interactive Transformer (CMIT) module, Multimodal Adaptive Weighted Fusion (MAWF) module, and a 3D-Integrated Attention Feature Enhancement (3D-IAFE) module. These components work together to comprehensively extract complementary feature between modalities and specific detailed feature within each modality, thereby enhancing the accuracy and robustness of multimodal object detection. Extensive experimental validation and in-depth ablation studies confirm the effectiveness of the proposed method, achieving state-of-the-art detection performance on multiple public datasets.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Task-Decoupled Knowledge Transfer for Cross-Modality Object Detection
    Wei, Chiheng
    Bai, Lianfa
    Chen, Xiaoyu
    Han, Jing
    ENTROPY, 2023, 25 (08)
  • [42] CAFCNet: Cross-modality asymmetric feature complement network for RGB-T salient object detection
    Jin, Dongze
    Shao, Feng
    Xie, Zhengxuan
    Mu, Baoyang
    Chen, Hangwei
    Jiang, Qiuping
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
  • [43] CMFF6D: Cross-modality multiscale feature fusion network for 6D pose estimation
    Han, Zongwang
    Chen, Long
    Wu, Shiqing
    NEUROCOMPUTING, 2025, 623
  • [44] 6D Object Pose Estimation Based on Cross-Modality Feature Fusion
    Jiang, Meng
    Zhang, Liming
    Wang, Xiaohua
    Li, Shuang
    Jiao, Yijie
    SENSORS, 2023, 23 (19)
  • [45] Improving RGB-Infrared Object Detection by Reducing Cross-Modality Redundancy
    Wang, Qingwang
    Chi, Yongke
    Shen, Tao
    Song, Jian
    Zhang, Zifeng
    Zhu, Yan
    REMOTE SENSING, 2022, 14 (09)
  • [46] Temporal-enhanced Cross-modality Fusion Network for Video Sentence Grounding
    Lv, Zezhong
    Su, Bing
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1487 - 1492
  • [47] Hybrid cross-modality fusion network for medical image segmentation with contrastive learning
    Zhou, Xichuan
    Song, Qianqian
    Nie, Jing
    Feng, Yujie
    Liu, Haijun
    Liang, Fu
    Chen, Lihui
    Xie, Jin
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 144
  • [48] Transformer-based cross-modality interaction guidance network for RGB-T salient object detection
    Luo, Jincheng
    Li, Yongjun
    Li, Bo
    Zhang, Xinru
    Li, Chaoyue
    Chenjin, Zhimin
    He, Jingyi
    Liang, Yifei
    NEUROCOMPUTING, 2024, 600
  • [49] Cross-modality fusion with EEG and text for enhanced emotion detection in English writing
    Wang, Jing
    Zhang, Ci
    FRONTIERS IN NEUROROBOTICS, 2025, 18
  • [50] Cross-Modality Manifold Adaptive Network for Industrial Multimode Processes and Its Applications
    Song, Xiao-Lu
    Zhang, Ning
    He, Yan-Lin
    Xu, Yuan
    Zhu, Qun-Xiong
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024,