Cascaded information enhancement and cross-modal attention feature fusion for multispectral pedestrian detection

被引:4
|
作者
Yang, Yang [1 ]
Xu, Kaixiong [1 ]
Wang, Kaizheng [2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China
[2] Kunming Univ Sci & Technol, Fac Elect Engn, Kunming, Peoples R China
基金
中国国家自然科学基金;
关键词
multispectral pedestrian detection; attention mechanism; feature fusion; convolutional neural network; background noise; IMAGE FUSION; NETWORK;
D O I
10.3389/fphy.2023.1121311
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Multispectral pedestrian detection is a technology designed to detect and locate pedestrians in Color and Thermal images, which has been widely used in automatic driving, video surveillance, etc. So far most available multispectral pedestrian detection algorithms only achieved limited success in pedestrian detection because of the lacking take into account the confusion of pedestrian information and background noise in Color and Thermal images. Here we propose a multispectral pedestrian detection algorithm, which mainly consists of a cascaded information enhancement module and a cross-modal attention feature fusion module. On the one hand, the cascaded information enhancement module adopts the channel and spatial attention mechanism to perform attention weighting on the features fused by the cascaded feature fusion block. Moreover, it multiplies the single-modal features with the attention weight element by element to enhance the pedestrian features in the single-modal and thus suppress the interference from the background. On the other hand, the cross-modal attention feature fusion module mines the features of both Color and Thermal modalities to complement each other, then the global features are constructed by adding the cross-modal complemented features element by element, which are attentionally weighted to achieve the effective fusion of the two modal features. Finally, the fused features are input into the detection head to detect and locate pedestrians. Extensive experiments have been performed on two improved versions of annotations (sanitized annotations and paired annotations) of the public dataset KAIST. The experimental results show that our method demonstrates a lower pedestrian miss rate and more accurate pedestrian detection boxes compared to the comparison method. Additionally, the ablation experiment also proved the effectiveness of each module designed in this paper.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Feature Enhancement and Multi-scale Cross-Modal Attention for RGB-D Salient Object Detection
    Wan, Xin
    Yang, Gang
    Zhou, Boyi
    Liu, Chang
    Wang, Hangxu
    Wang, Yutao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 409 - 420
  • [22] Attention-Based Cross-Modality Feature Complementation for Multispectral Pedestrian Detection
    Jiang, Qunyan
    Dai, Juying
    Rui, Ting
    Shao, Faming
    Wang, Jinkang
    Lu, Guanlin
    IEEE ACCESS, 2022, 10 : 53797 - 53809
  • [23] FDENet: Fusion Depth Semantics and Edge-Attention Information for Multispectral Pedestrian Detection
    Liu, Xiaowei
    Xu, Xinying
    Xie, Jun
    Li, Pengyue
    Wei, Jiamin
    Sang, Yiyu
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (06) : 5441 - 5448
  • [24] Cascaded Cross-Modal Transformer for Request and Complaint Detection
    Ristea, Nicolae-Catalin
    Ionescu, Radu Tudor
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9467 - 9471
  • [25] Attention-based Cross-Modality Multiscale Fusion for Multispectral Pedestrian Detection
    Hui, Zhou
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 1244 - 1253
  • [26] Joint feature fusion hashing for cross-modal retrieval
    Cao, Yuxia
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (12) : 6149 - 6162
  • [27] CACFNet: Cross-Modal Attention Cascaded Fusion Network for RGB-T Urban Scene Parsing
    Zhou, Wujie
    Dong, Shaohua
    Fang, Meixin
    Yu, Lu
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 1919 - 1929
  • [28] Attention-Guided Multi-modal and Multi-scale Fusion for Multispectral Pedestrian Detection
    Bao, Wei
    Huang, Meiyu
    Hu, Jingjing
    Xiang, Xueshuang
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2022, 2022, 13534 : 382 - 393
  • [29] Attention Fusion for One-Stage Multispectral Pedestrian Detection
    Cao, Zhiwei
    Yang, Huihua
    Zhao, Juan
    Guo, Shuhong
    Li, Lingqiao
    SENSORS, 2021, 21 (12)
  • [30] Cross-modal pedestrian re-recognition based on attention mechanism
    Yuyao Zhao
    Hang Zhou
    Hai Cheng
    Chunguang Huang
    The Visual Computer, 2024, 40 : 2405 - 2418