CrossFormer: Cross-guided attention for multi-modal object detection

被引:10
|
作者
Lee, Seungik [1 ]
Park, Jaehyeong [2 ]
Park, Jinsun [2 ,3 ]
机构
[1] Pusan Natl Univ, Dept Informat Convergence Engn Artificial Intellig, 2 Busandaehak ro 63beon gil, Busan 46241, South Korea
[2] Pusan Natl Univ, Sch Comp Sci & Engn, 2 Busandaehak Ro 63beon Gil, Busan 46241, South Korea
[3] Pusan Natl Univ, Ctr Artificial Intelligence Res, 2 Busandaehak ro 63beon gil, Pusan 46241, South Korea
基金
新加坡国家研究基金会;
关键词
Object detection; Multi-modal; Sensor fusion; TRANSFORMER;
D O I
10.1016/j.patrec.2024.02.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object detection is one of the essential tasks in a variety of real -world applications such as autonomous driving and robotics. In a real -world scenario, unfortunately, there are numerous challenges such as illumination changes, adverse weather conditions, and geographical changes, to name a few. To tackle the problem, we propose a novel multi -modal object detection model that is built upon a hierarchical transformer and cross -guidance between different modalities. The proposed hierarchical transformer consists of domain -specific feature extraction networks where intermediate features are connected by the proposed Cross -Guided Attention Module (CGAM) to enrich their representational power. Specifically, in the CGAM, one domain is regarded as a guide and the other is assigned to a base. After that, the cross -modal attention from the guide to the base is applied to the base feature. The CGAM works bidirectionally in parallel by exchanging roles between modalities to refine multi -modal features simultaneously. Experimental results on FLIR-aligned, LLVIP, and KAIST multispectral pedestrian datasets demonstrate that the proposed method is superior to previous multi -modal detection algorithms quantitatively and qualitatively.
引用
收藏
页码:144 / 150
页数:7
相关论文
共 50 条
  • [1] Pedestrian detection network with multi-modal cross-guided learning
    Hua, ChunJian
    Sun, MingChun
    Zhu, Yu
    Jiang, Yi
    Yu, JianFeng
    Chen, Ying
    DIGITAL SIGNAL PROCESSING, 2022, 122
  • [2] Multi-Modal Attention Guided Real-Time Lane Detection
    Zhang, Xinyu
    Gong, Yan
    Li, Zhiwei
    Liu, Xuan
    Pan, Shuyue
    Li, Jun
    2021 6TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2021), 2021, : 146 - 153
  • [3] Cross-Modal Attention-Guided Convolutional Network for Multi-modal Cardiac Segmentation
    Zhou, Ziqi
    Guo, Xinna
    Yang, Wanqi
    Shi, Yinghuan
    Zhou, Luping
    Wang, Lei
    Yang, Ming
    MACHINE LEARNING IN MEDICAL IMAGING (MLMI 2019), 2019, 11861 : 601 - 610
  • [4] Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
    Wu, Jiajia
    Han, Guangliang
    Wang, Haining
    Yang, Hang
    Li, Qingqing
    Liu, Dongxu
    Ye, Fangjian
    Liu, Peixun
    IEEE ACCESS, 2021, 9 : 150608 - 150622
  • [5] A multi-modal object attention system for a mobile robot
    Haasch, A
    Hofemann, N
    Fritsch, J
    Sagerer, G
    2005 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4, 2005, : 1499 - 1504
  • [6] Cross-modal attention for multi-modal image registration
    Song, Xinrui
    Chao, Hanqing
    Xu, Xuanang
    Guo, Hengtao
    Xu, Sheng
    Turkbey, Baris
    Wood, Bradford J.
    Sanford, Thomas
    Wang, Ge
    Yan, Pingkun
    MEDICAL IMAGE ANALYSIS, 2022, 82
  • [7] Multi-modal Queried Object Detection in the Wild
    Xu, Yifan
    Zhang, Mengdan
    Fu, Chaoyou
    Chen, Peixian
    Yang, Xiaoshan
    Li, Ke
    Xu, Changsheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Attention-Guided Multi-modal and Multi-scale Fusion for Multispectral Pedestrian Detection
    Bao, Wei
    Huang, Meiyu
    Hu, Jingjing
    Xiang, Xueshuang
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2022, 2022, 13534 : 382 - 393
  • [9] MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
    Zhan, Chenlu
    Lin, Yu
    Wang, Gaoang
    Wang, Hongwei
    Wu, Jian
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 11502 - +
  • [10] Multi-Level Multi-Modal Cross-Attention Network for Fake News Detection
    Ying, Long
    Yu, Hui
    Wang, Jinguang
    Ji, Yongze
    Qian, Shengsheng
    IEEE ACCESS, 2021, 9 : 132363 - 132373