Cascaded information enhancement and cross-modal attention feature fusion for multispectral pedestrian detection

被引:4
|
作者
Yang, Yang [1 ]
Xu, Kaixiong [1 ]
Wang, Kaizheng [2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China
[2] Kunming Univ Sci & Technol, Fac Elect Engn, Kunming, Peoples R China
基金
中国国家自然科学基金;
关键词
multispectral pedestrian detection; attention mechanism; feature fusion; convolutional neural network; background noise; IMAGE FUSION; NETWORK;
D O I
10.3389/fphy.2023.1121311
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Multispectral pedestrian detection is a technology designed to detect and locate pedestrians in Color and Thermal images, which has been widely used in automatic driving, video surveillance, etc. So far most available multispectral pedestrian detection algorithms only achieved limited success in pedestrian detection because of the lacking take into account the confusion of pedestrian information and background noise in Color and Thermal images. Here we propose a multispectral pedestrian detection algorithm, which mainly consists of a cascaded information enhancement module and a cross-modal attention feature fusion module. On the one hand, the cascaded information enhancement module adopts the channel and spatial attention mechanism to perform attention weighting on the features fused by the cascaded feature fusion block. Moreover, it multiplies the single-modal features with the attention weight element by element to enhance the pedestrian features in the single-modal and thus suppress the interference from the background. On the other hand, the cross-modal attention feature fusion module mines the features of both Color and Thermal modalities to complement each other, then the global features are constructed by adding the cross-modal complemented features element by element, which are attentionally weighted to achieve the effective fusion of the two modal features. Finally, the fused features are input into the detection head to detect and locate pedestrians. Extensive experiments have been performed on two improved versions of annotations (sanitized annotations and paired annotations) of the public dataset KAIST. The experimental results show that our method demonstrates a lower pedestrian miss rate and more accurate pedestrian detection boxes compared to the comparison method. Additionally, the ablation experiment also proved the effectiveness of each module designed in this paper.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Cross-modal pedestrian re-recognition based on attention mechanism
    Zhao, Yuyao
    Zhou, Hang
    Cheng, Hai
    Huang, Chunguang
    VISUAL COMPUTER, 2024, 40 (04): : 2405 - 2418
  • [32] Learning Cross-Modal Deep Representations for Robust Pedestrian Detection
    Xu, Dan
    Ouyang, Wanli
    Ricci, Elisa
    Wang, Xiaogang
    Sebe, Nicu
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4236 - 4244
  • [33] INSANet: INtra-INter Spectral Attention Network for Effective Feature Fusion of Multispectral Pedestrian Detection
    Lee, Sangin
    Kim, Taejoo
    Shin, Jeongmin
    Kim, Namil
    Choi, Yukyung
    SENSORS, 2024, 24 (04)
  • [34] Deep Feature Fusion by Competitive Attention for Pedestrian Detection
    Chen, Zhichang
    Zhang, Li
    Khattak, Abdul Mateen
    Gao, Wanlin
    Wang, Minjuan
    IEEE ACCESS, 2019, 7 : 21981 - 21989
  • [35] ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection
    Shen, Jifeng
    Chen, Yifei
    Liu, Yue
    Zuo, Xin
    Fan, Heng
    Yang, Wankou
    PATTERN RECOGNITION, 2024, 145
  • [36] Cross-modal image fusion guided by subjective visual attention
    Fang, Aiqing
    Zhao, Xinbo
    Zhang, Yanning
    NEUROCOMPUTING, 2020, 414 (414) : 333 - 345
  • [37] Neural substrates of perceptual enhancement by cross-modal spatial attention
    McDonald, JJ
    Teder-Sälejärvi, WA
    Di Russo, F
    Hillyard, SA
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2003, 15 (01) : 10 - 19
  • [38] Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object Detection
    Meng, Lingbing
    Yuan, Mengya
    Shi, Xuehan
    Liu, Qingqing
    Zhange, Le
    Wu, Jinhua
    Dai, Ping
    Cheng, Fei
    ADVANCES IN MULTIMEDIA, 2023, 2023
  • [39] Deep Label Feature Fusion Hashing for Cross-Modal Retrieval
    Ren, Dongxiao
    Xu, Weihua
    Wang, Zhonghua
    Sun, Qinxiu
    IEEE ACCESS, 2022, 10 : 100276 - 100285
  • [40] CMFFN: An efficient cross-modal feature fusion network for semantic
    Zhang, Yingjian
    Li, Ning
    Jiao, Jichao
    Ai, Jiawen
    Yan, Zheng
    Zeng, Yingchao
    Zhang, Tianxiang
    Li, Qian
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2025, 186