CrossFormer: Cross-guided attention for multi-modal object detection

被引:10
|
作者
Lee, Seungik [1 ]
Park, Jaehyeong [2 ]
Park, Jinsun [2 ,3 ]
机构
[1] Pusan Natl Univ, Dept Informat Convergence Engn Artificial Intellig, 2 Busandaehak ro 63beon gil, Busan 46241, South Korea
[2] Pusan Natl Univ, Sch Comp Sci & Engn, 2 Busandaehak Ro 63beon Gil, Busan 46241, South Korea
[3] Pusan Natl Univ, Ctr Artificial Intelligence Res, 2 Busandaehak ro 63beon gil, Pusan 46241, South Korea
基金
新加坡国家研究基金会;
关键词
Object detection; Multi-modal; Sensor fusion; TRANSFORMER;
D O I
10.1016/j.patrec.2024.02.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object detection is one of the essential tasks in a variety of real -world applications such as autonomous driving and robotics. In a real -world scenario, unfortunately, there are numerous challenges such as illumination changes, adverse weather conditions, and geographical changes, to name a few. To tackle the problem, we propose a novel multi -modal object detection model that is built upon a hierarchical transformer and cross -guidance between different modalities. The proposed hierarchical transformer consists of domain -specific feature extraction networks where intermediate features are connected by the proposed Cross -Guided Attention Module (CGAM) to enrich their representational power. Specifically, in the CGAM, one domain is regarded as a guide and the other is assigned to a base. After that, the cross -modal attention from the guide to the base is applied to the base feature. The CGAM works bidirectionally in parallel by exchanging roles between modalities to refine multi -modal features simultaneously. Experimental results on FLIR-aligned, LLVIP, and KAIST multispectral pedestrian datasets demonstrate that the proposed method is superior to previous multi -modal detection algorithms quantitatively and qualitatively.
引用
收藏
页码:144 / 150
页数:7
相关论文
共 50 条
  • [21] A Tri-Attention fusion guided multi-modal segmentation network
    Zhou, Tongxue
    Ruan, Su
    Vera, Pierre
    Canu, Stephane
    PATTERN RECOGNITION, 2022, 124
  • [22] Object detection in multi-modal images using genetic programming
    Bhanu, B
    Lin, YQ
    APPLIED SOFT COMPUTING, 2004, 4 (02) : 175 - 201
  • [23] Class-Agnostic Object Detection with Multi-modal Transformer
    Maaz, Muhammad
    Rasheed, Hanoona
    Khan, Salman
    Khan, Fahad Shahbaz
    Anwer, Rao Muhammad
    Yang, Ming-Hsuan
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 512 - 531
  • [24] Human head detection using multi-modal object features
    Luo, Y
    Murphey, YL
    Khairallah, F
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 2134 - 2139
  • [25] Cross Attention Transformers for Multi-modal Unsupervised Whole-Body PET Anomaly Detection
    Patel, Ashay
    Tudosiu, Petru-Daniel
    Pinaya, Walter Hugo Lopez
    Cook, Gary
    Goh, Vicky
    Ourselin, Sebastien
    Cardoso, M. Jorge
    DEEP GENERATIVE MODELS, DGM4MICCAI 2022, 2022, 13609 : 14 - 23
  • [26] Attention-based multi-modal fusion sarcasm detection
    Liu, Jing
    Tian, Shengwei
    Yu, Long
    Long, Jun
    Zhou, Tiejun
    Wang, Bo
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2097 - 2108
  • [27] Multi-Modal Streaming 3D Object Detection
    Abdelfattah, Mazen
    Yuan, Kaiwen
    Wang, Z. Jane
    Ward, Rabab
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6163 - 6170
  • [28] Cascade fusion of multi-modal and multi-source feature fusion by the attention for three-dimensional object detection
    Yu, Fengning
    Lian, Jing
    Li, Linhui
    Zhao, Jian
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [29] Hadamard matrix-guided multi-modal hashing for multi-modal retrieval
    Yu, Jun
    Huang, Wei
    Li, Zuhe
    Shu, Zhenqiu
    Zhu, Liang
    DIGITAL SIGNAL PROCESSING, 2022, 130
  • [30] IS CROSS-ATTENTION PREFERABLE TO SELF-ATTENTION FOR MULTI-MODAL EMOTION RECOGNITION?
    Rajan, Vandana
    Brutti, Alessio
    Cavallaro, Andrea
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4693 - 4697