Unsupervised RGB-T object tracking with attentional multi-modal feature fusion

被引:0
|
作者
Shenglan Li
Rui Yao
Yong Zhou
Hancheng Zhu
Bing Liu
Jiaqi Zhao
Zhiwen Shao
机构
[1] China University of Mining and Technology,School of Computer Science and Technology
[2] Ministry of Education of the Peoples Republic of China,Engineering Research Center of Mine Digitization
来源
关键词
Unsupervised learning; RGB-T object tracking; Attention mechanism;
D O I
暂无
中图分类号
学科分类号
摘要
RGB-T tracking means that given the object position in the first frame, the tracker is trained to predict the position of the object in consecutive frames by taking full advantage of the complementary information of RGB and thermal infrared images. As the amount of data increases, unsupervised training has great potential for development in RGB-T tracking task. As we all know, features extracted from different convolutional layers can provide different levels information in the image. In this paper, we propose a framework for visual tracking based on the attention mechanism fusion of multi-modal and multi-level features. This fusion method can give full play to the advantages of multi-level and multi-modal information. Specificly, we use a feature fusion module to fuse these features from different levels and different modalities at the same time. We use cycle consistency based on a correlation filter to implement unsupervised training of the model to reduce the cost of annotated data. The proposed tracker is evaluated on two popular benchmark datasets, GTOT and RGB-T234. Experimental results show that our tracker performs favorably against other state-of-the-art unsupervised trackers with a real-time tracking speed.
引用
收藏
页码:23595 / 23613
页数:18
相关论文
共 50 条
  • [1] Unsupervised RGB-T object tracking with attentional multi-modal feature fusion
    Li, Shenglan
    Yao, Rui
    Zhou, Yong
    Zhu, Hancheng
    Liu, Bing
    Zhao, Jiaqi
    Shao, Zhiwen
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (15) : 23595 - 23613
  • [2] Multi-modal adapter for RGB-T tracking
    Wang, He
    Xu, Tianyang
    Tang, Zhangyong
    Wu, Xiao-Jun
    Kittler, Josef
    INFORMATION FUSION, 2025, 118
  • [3] Multi-Modal Fusion for End-to-End RGB-T Tracking
    Zhang, Lichao
    Danelljan, Martin
    Gonzalez-Garcia, Abel
    van de Weijer, Joost
    Khan, Fahad Shahbaz
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2252 - 2261
  • [4] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
    Gao, Wei
    Liao, Guibiao
    Ma, Siwei
    Li, Ge
    Liang, Yongsheng
    Lin, Weisi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
  • [5] Multi-modal interaction with token division strategy for RGB-T tracking
    Cai, Yujue
    Sui, Xiubao
    Gu, Guohua
    Chen, Qian
    PATTERN RECOGNITION, 2024, 155
  • [6] Multi-Modal Object Tracking and Image Fusion With Unsupervised Deep Learning
    LaHaye, Nicholas
    Ott, Jordan
    Garay, Michael J.
    El-Askary, Hesham Mohamed
    Linstead, Erik
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2019, 12 (08) : 3056 - 3066
  • [7] Multi-modal neural networks with multi-scale RGB-T fusion for semantic segmentation
    Lyu, Y.
    Schiopu, I.
    Munteanu, A.
    ELECTRONICS LETTERS, 2020, 56 (18) : 920 - 922
  • [8] SCA-MMA: Spatial and Channel-Aware Multi-Modal Adaptation for Robust RGB-T Object Tracking
    Shi, Run
    Wang, Chaoqun
    Zhao, Gang
    Xu, Chunyan
    ELECTRONICS, 2022, 11 (12)
  • [9] FEATURE ENHANCEMENT AND FUSION FOR RGB-T SALIENT OBJECT DETECTION
    Sun, Fengming
    Zhang, Kang
    Yuan, Xia
    Zhao, Chunxia
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1300 - 1304
  • [10] Revisiting Feature Fusion for RGB-T Salient Object Detection
    Zhang, Qiang
    Xiao, Tonglin
    Huang, Nianchang
    Zhang, Dingwen
    Han, Jungong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1804 - 1818