Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking

被引:8
|
作者
Luo, Yang [1 ,2 ]
Guo, Xiqing [1 ,2 ]
Dong, Mingtao [3 ]
Yu, Jin [1 ,2 ]
机构
[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100040, Peoples R China
[3] Northeastern Univ, Inst Image Recognit & Machine Intelligence, Shenyang 110167, Peoples R China
关键词
multi-modality adaptive fusion; mixed-attention mechanism; RGB-T tracking; NETWORK;
D O I
10.3390/s23146609
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
RGB-T tracking involves the use of images from both visible and thermal modalities. The primary objective is to adaptively leverage the relatively dominant modality in varying conditions to achieve more robust tracking compared to single-modality tracking. An RGB-T tracker based on a mixed-attention mechanism to achieve a complementary fusion of modalities (referred to as MACFT) is proposed in this paper. In the feature extraction stage, we utilize different transformer backbone branches to extract specific and shared information from different modalities. By performing mixed-attention operations in the backbone to enable information interaction and self-enhancement between the template and search images, a robust feature representation is constructed that better understands the high-level semantic features of the target. Then, in the feature fusion stage, a modality shared-specific feature interaction structure was designed based on a mixed-attention mechanism, effectively suppressing low-quality modality noise while enhancing the information from the dominant modality. Evaluation on multiple RGB-T public datasets demonstrates that our proposed tracker outperforms other RGB-T trackers on general evaluation metrics while also being able to adapt to long-term tracking scenarios.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Object Fusion Tracking for RGB-T Images via Channel Swapping and Modal Mutual Attention
    Luan, Tian
    Zhang, Hui
    Li, Jiafeng
    Zhang, Jing
    Zhuo, Li
    IEEE SENSORS JOURNAL, 2023, 23 (19) : 22930 - 22943
  • [32] Learning Multiscale Deep Features and SVM Regressors for Adaptive RGB-T Saliency Detection
    Ma, Yunpeng
    Sun, Dengdi
    Meng, Qianqian
    Ding, Zhuanlian
    Li, Chenglong
    2017 10TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL. 1, 2017, : 389 - 392
  • [33] Region Selective Fusion Network for Robust RGB-T Tracking
    Yu, Zhencheng
    Fan, Huijie
    Wang, Qiang
    Li, Ziwan
    Tang, Yandong
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1357 - 1361
  • [34] Modal complementary fusion network for RGB-T salient object detection
    Ma, Shuai
    Song, Kechen
    Dong, Hongwen
    Tian, Hongkun
    Yan, Yunhui
    APPLIED INTELLIGENCE, 2023, 53 (08) : 9038 - 9055
  • [35] Learning Local-Global Multi-Graph Descriptors for RGB-T Object Tracking
    Li, Chenglong
    Zhu, Chengli
    Zhang, Jian
    Luo, Bin
    Wu, Xiaohao
    Tang, Jin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (10) : 2913 - 2926
  • [36] Bridging Search Region Interaction with Template for RGB-T Tracking
    Hui, Tianrui
    Xun, Zizheng
    Peng, Fengguang
    Huang, Junshi
    Wei, Xiaoming
    Wei, Xiaolin
    Dai, Jiao
    Han, Jizhong
    Liu, Si
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13630 - 13639
  • [37] Modal complementary fusion network for RGB-T salient object detection
    Shuai Ma
    Kechen Song
    Hongwen Dong
    Hongkun Tian
    Yunhui Yan
    Applied Intelligence, 2023, 53 : 9038 - 9055
  • [38] Residual learning-based two-stream network for RGB-T object tracking
    Chen, Yili
    Wan, Minjie
    Xu, Yunkai
    Zhang, Xiaojie
    Chen, Qian
    Gu, Guohua
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (06)
  • [39] Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking
    Pengyu Zhang
    Dong Wang
    Huchuan Lu
    Xiaoyun Yang
    International Journal of Computer Vision, 2021, 129 : 2714 - 2729
  • [40] A unified RGB-T crowd counting learning framework
    Gu, Siqi
    Lian, Zhichao
    IMAGE AND VISION COMPUTING, 2023, 131