CrossFormer: Cross-guided attention for multi-modal object detection

被引：10

作者：

Lee, Seungik ^{[1
]}

Park, Jaehyeong ^{[2
]}

Park, Jinsun ^{[2
,3
]}

机构：

[1] Pusan Natl Univ, Dept Informat Convergence Engn Artificial Intellig, 2 Busandaehak ro 63beon gil, Busan 46241, South Korea

[2] Pusan Natl Univ, Sch Comp Sci & Engn, 2 Busandaehak Ro 63beon Gil, Busan 46241, South Korea

[3] Pusan Natl Univ, Ctr Artificial Intelligence Res, 2 Busandaehak ro 63beon gil, Pusan 46241, South Korea

来源：

PATTERN RECOGNITION LETTERS | 2024年 / 179卷

基金：

新加坡国家研究基金会;

关键词：

Object detection; Multi-modal; Sensor fusion; TRANSFORMER;

D O I：

10.1016/j.patrec.2024.02.012

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Object detection is one of the essential tasks in a variety of real -world applications such as autonomous driving and robotics. In a real -world scenario, unfortunately, there are numerous challenges such as illumination changes, adverse weather conditions, and geographical changes, to name a few. To tackle the problem, we propose a novel multi -modal object detection model that is built upon a hierarchical transformer and cross -guidance between different modalities. The proposed hierarchical transformer consists of domain -specific feature extraction networks where intermediate features are connected by the proposed Cross -Guided Attention Module (CGAM) to enrich their representational power. Specifically, in the CGAM, one domain is regarded as a guide and the other is assigned to a base. After that, the cross -modal attention from the guide to the base is applied to the base feature. The CGAM works bidirectionally in parallel by exchanging roles between modalities to refine multi -modal features simultaneously. Experimental results on FLIR-aligned, LLVIP, and KAIST multispectral pedestrian datasets demonstrate that the proposed method is superior to previous multi -modal detection algorithms quantitatively and qualitatively.

引用

页码：144 / 150

页数：7

共 50 条

[21] A Tri-Attention fusion guided multi-modal segmentation network
Zhou, Tongxue
Ruan, Su
Vera, Pierre
Canu, Stephane
PATTERN RECOGNITION, 2022, 124
[22] Object detection in multi-modal images using genetic programming
Bhanu, B
Lin, YQ
APPLIED SOFT COMPUTING, 2004, 4 (02) : 175 - 201
[23] Class-Agnostic Object Detection with Multi-modal Transformer
Maaz, Muhammad
Rasheed, Hanoona
Khan, Salman
Khan, Fahad Shahbaz
Anwer, Rao Muhammad
Yang, Ming-Hsuan
COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 512 - 531
[24] Human head detection using multi-modal object features
Luo, Y
Murphey, YL
Khairallah, F
PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 2134 - 2139
[25] Cross Attention Transformers for Multi-modal Unsupervised Whole-Body PET Anomaly Detection
Patel, Ashay
Tudosiu, Petru-Daniel
Pinaya, Walter Hugo Lopez
Cook, Gary
Goh, Vicky
Ourselin, Sebastien
Cardoso, M. Jorge
DEEP GENERATIVE MODELS, DGM4MICCAI 2022, 2022, 13609 : 14 - 23
[26] Attention-based multi-modal fusion sarcasm detection
Liu, Jing
Tian, Shengwei
Yu, Long
Long, Jun
Zhou, Tiejun
Wang, Bo
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2097 - 2108
[27] Multi-Modal Streaming 3D Object Detection
Abdelfattah, Mazen
Yuan, Kaiwen
Wang, Z. Jane
Ward, Rabab
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6163 - 6170
[28] Cascade fusion of multi-modal and multi-source feature fusion by the attention for three-dimensional object detection
Yu, Fengning
Lian, Jing
Li, Linhui
Zhao, Jian
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
[29] Hadamard matrix-guided multi-modal hashing for multi-modal retrieval
Yu, Jun
Huang, Wei
Li, Zuhe
Shu, Zhenqiu
Zhu, Liang
DIGITAL SIGNAL PROCESSING, 2022, 130
[30] IS CROSS-ATTENTION PREFERABLE TO SELF-ATTENTION FOR MULTI-MODAL EMOTION RECOGNITION?
Rajan, Vandana
Brutti, Alessio
Cavallaro, Andrea
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4693 - 4697

← 1 2 3 4 5 →