Cascaded information enhancement and cross-modal attention feature fusion for multispectral pedestrian detection

被引:4
|
作者
Yang, Yang [1 ]
Xu, Kaixiong [1 ]
Wang, Kaizheng [2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China
[2] Kunming Univ Sci & Technol, Fac Elect Engn, Kunming, Peoples R China
基金
中国国家自然科学基金;
关键词
multispectral pedestrian detection; attention mechanism; feature fusion; convolutional neural network; background noise; IMAGE FUSION; NETWORK;
D O I
10.3389/fphy.2023.1121311
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Multispectral pedestrian detection is a technology designed to detect and locate pedestrians in Color and Thermal images, which has been widely used in automatic driving, video surveillance, etc. So far most available multispectral pedestrian detection algorithms only achieved limited success in pedestrian detection because of the lacking take into account the confusion of pedestrian information and background noise in Color and Thermal images. Here we propose a multispectral pedestrian detection algorithm, which mainly consists of a cascaded information enhancement module and a cross-modal attention feature fusion module. On the one hand, the cascaded information enhancement module adopts the channel and spatial attention mechanism to perform attention weighting on the features fused by the cascaded feature fusion block. Moreover, it multiplies the single-modal features with the attention weight element by element to enhance the pedestrian features in the single-modal and thus suppress the interference from the background. On the other hand, the cross-modal attention feature fusion module mines the features of both Color and Thermal modalities to complement each other, then the global features are constructed by adding the cross-modal complemented features element by element, which are attentionally weighted to achieve the effective fusion of the two modal features. Finally, the fused features are input into the detection head to detect and locate pedestrians. Extensive experiments have been performed on two improved versions of annotations (sanitized annotations and paired annotations) of the public dataset KAIST. The experimental results show that our method demonstrates a lower pedestrian miss rate and more accurate pedestrian detection boxes compared to the comparison method. Additionally, the ablation experiment also proved the effectiveness of each module designed in this paper.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Multispectral image change detection with kernel cross-modal factor analysis-based fusion of kernels
    Tan, Xiaofeng
    Li, Ming
    Zhang, Peng
    Wu, Yan
    Jiao, Meijing
    JOURNAL OF APPLIED REMOTE SENSING, 2018, 12 (03):
  • [42] CMEFusion: Cross-Modal Enhancement and Fusion of FIR and Visible Images
    Tong, Xi
    Luo, Xing
    Yang, Jiangxin
    Cao, Yanpeng
    IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2024, 10 : 1331 - 1345
  • [43] RGB depth salient object detection via cross-modal attention and boundary feature guidance
    Meng, Lingbing
    Yuan, Mengya
    Shi, Xuehan
    Zhang, Le
    Liu, Qingqing
    Ping, Dai
    Wu, Jinhua
    Cheng, Fei
    IET COMPUTER VISION, 2024, 18 (02) : 273 - 288
  • [44] Cascaded Cross-Layer Fusion Network for Pedestrian Detection
    Ding, Zhifeng
    Gu, Zichen
    Sun, Yanpeng
    Xiang, Xinguang
    MATHEMATICS, 2022, 10 (01)
  • [45] Feature Map Swap: Multispectral Data Fusion Method for Pedestrian Detection
    Ryu, Junhwan
    Kim, Sungho
    2019 19TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2019), 2019, : 319 - 323
  • [46] Multichannel Cross-Modal Fusion Network for Multimodal Sentiment Analysis Considering Language Information Enhancement
    Hu, Ronglong
    Yi, Jizheng
    Chen, Aibin
    Chen, Lijiang
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (07) : 9814 - 9824
  • [47] Underwater target detection and recognition based on cross-modal fusion of flow and electric information
    Fu, Tongqiang
    Hu, Qiao
    Zhao, Jiawei
    Jiang, Guangyu
    Shan, Liuhao
    Rong, Yi
    MEASUREMENT, 2025, 246
  • [48] Cross-modal pedestrian re-identification technique based on multi-scale feature attention and strategy balancing
    Lai, Yiqiang
    ENGINEERING RESEARCH EXPRESS, 2025, 7 (01):
  • [49] Automatic curtain wall frame detection based on deep learning and cross-modal feature fusion
    Wu, Decheng
    Li, Yu
    Li, Rui
    Cheng, Longqi
    Zhao, Jingyuan
    Zhao, Mingfu
    Lee, Chul Hee
    AUTOMATION IN CONSTRUCTION, 2024, 160
  • [50] Stacked cross-modal feature consolidation attention networks for image captioning
    Pourkeshavarz, Mozhgan
    Nabavi, Shahabedin
    Moghaddam, Mohsen Ebrahimi
    Shamsfard, Mehrnoush
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12209 - 12233