CrossFormer: Cross-guided attention for multi-modal object detection

被引:10
|
作者
Lee, Seungik [1 ]
Park, Jaehyeong [2 ]
Park, Jinsun [2 ,3 ]
机构
[1] Pusan Natl Univ, Dept Informat Convergence Engn Artificial Intellig, 2 Busandaehak ro 63beon gil, Busan 46241, South Korea
[2] Pusan Natl Univ, Sch Comp Sci & Engn, 2 Busandaehak Ro 63beon Gil, Busan 46241, South Korea
[3] Pusan Natl Univ, Ctr Artificial Intelligence Res, 2 Busandaehak ro 63beon gil, Pusan 46241, South Korea
基金
新加坡国家研究基金会;
关键词
Object detection; Multi-modal; Sensor fusion; TRANSFORMER;
D O I
10.1016/j.patrec.2024.02.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object detection is one of the essential tasks in a variety of real -world applications such as autonomous driving and robotics. In a real -world scenario, unfortunately, there are numerous challenges such as illumination changes, adverse weather conditions, and geographical changes, to name a few. To tackle the problem, we propose a novel multi -modal object detection model that is built upon a hierarchical transformer and cross -guidance between different modalities. The proposed hierarchical transformer consists of domain -specific feature extraction networks where intermediate features are connected by the proposed Cross -Guided Attention Module (CGAM) to enrich their representational power. Specifically, in the CGAM, one domain is regarded as a guide and the other is assigned to a base. After that, the cross -modal attention from the guide to the base is applied to the base feature. The CGAM works bidirectionally in parallel by exchanging roles between modalities to refine multi -modal features simultaneously. Experimental results on FLIR-aligned, LLVIP, and KAIST multispectral pedestrian datasets demonstrate that the proposed method is superior to previous multi -modal detection algorithms quantitatively and qualitatively.
引用
收藏
页码:144 / 150
页数:7
相关论文
共 50 条
  • [41] Leveraging Uncertainties for Deep Multi-modal Object Detection in Autonomous Driving
    Feng, Di
    Cao, Yifan
    Rosenbaum, Lars
    Timm, Fabian
    Dietmayer, Klaus
    2020 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2020, : 871 - 878
  • [42] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
    Xiao, Yun
    Huang, Yameng
    Li, Chenglong
    Liu, Lei
    Zhou, Aiwu
    Tang, Jin
    COGNITIVE COMPUTATION, 2023, 15 (06) : 1868 - 1883
  • [43] Multi-Modal Fusion for Moving Object Detection in Static and Complex Backgrounds
    Jiang, Huali
    Li, Xin
    TRAITEMENT DU SIGNAL, 2023, 40 (05) : 1941 - 1950
  • [44] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
    Yun Xiao
    Yameng Huang
    Chenglong Li
    Lei Liu
    Aiwu Zhou
    Jin Tang
    Cognitive Computation, 2023, 15 : 1868 - 1883
  • [45] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
    Song, Peipei
    Zhang, Jing
    Koniusz, Piotr
    Barnes, Nick
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
  • [46] Hierarchical Multi-modal Contextual Attention Network for Fake News Detection
    Qian, Shengsheng
    Wang, Jinguang
    Hu, Jun
    Fang, Quan
    Xu, Changsheng
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 153 - 162
  • [47] Vehicle Detection of Multi-Modal Attention Fusion Under Different Illumination
    Wang, Jiaqi
    Zhang, Qi
    Huang, Wei
    Computer Engineering and Applications, 2024, 60 (16) : 116 - 123
  • [48] GRAPH ATTENTION MODEL EMBEDDED WITH MULTI-MODAL KNOWLEDGE FOR DEPRESSION DETECTION
    Zheng, Wenbo
    Yan, Lan
    Gou, Chao
    Wang, Fei-Yue
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [49] A Multi-modal Moving Object Detection Method Based on GrowCut Segmentation
    Zhang, Xiuwei
    Zhang, Yanning
    Maybank, Stephen John
    Liang, Jun
    2014 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE FOR MULTIMEDIA, SIGNAL AND VISION PROCESSING (CIMSIVP), 2014, : 213 - 218
  • [50] Learning Adaptive Fusion Bank for Multi-Modal Salient Object Detection
    Wang, Kunpeng
    Tu, Zhengzheng
    Li, Chenglong
    Zhang, Cheng
    Luo, Bin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7344 - 7358