Multi-Level Fusion for Robust RGBT Tracking via Enhanced Thermal Representation

被引:1
|
作者
Tang, Zhangyong [1 ]
Xu, Tianyang [1 ]
Wu, Xiao-jun [1 ]
Kittler, Josef [2 ]
机构
[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi, Peoples R China
[2] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, England
基金
中国国家自然科学基金; 英国工程与自然科学研究理事会;
关键词
Visual object tracking; RGBT tracking; thermal enhancement; multi-modal multi-level fusion; BENCHMARK;
D O I
10.1145/3678176
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the limitations of visible (RGB) sensors in challenging scenarios, such as nighttime and foggy environments, the thermal infrared (TIR) modality draws increasing attention as an auxiliary source for robust tracking systems. Currently, the existing methods extract both the RGB and TIR (RGBT) clues in a similar approach, i.e., utilising RGB-pretrained models with or without finetuning, and then aggregate the multi-modal information through a fusion block embedded in a single level. However, the different imaging principles of RGB and TIR data raise questions about the suitability of RGB-pretrained models for thermal data. In this article, it is argued that the modality gap is overlooked, and an alternative training paradigm is proposed for TIR data to ensure consistency between the training and test data, which is achieved by optimising the TIR feature extractor with only TIR data involved. Furthermore, with the goal of making better use of the enhanced thermal representations, a multi-level fusion strategy is inspired by the observation that various fusion strategies at different levels can contribute to a better performance. Specifically, fusion modules at both the feature and decision levels are derived for a comprehensive fusion procedure while the pixel-level fusion strategy is not considered due to the misalignment of multi-modal image pairs. The effectiveness of our method is demonstrated by extensive qualitative and quantitative experiments conducted on several challenging benchmarks. Code will be released at https://github.com/Zhangyong-Tang/MELT.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] End-to-End Correlation Tracking With Enhanced Multi-Level Feature Fusion
    Liu, Guangen
    Liu, Guizhong
    IEEE ACCESS, 2021, 9 : 128827 - 128840
  • [2] Multi-level enhanced target identification fusion method
    Ku, JK
    Ock, SY
    SENSOR FUSION: ARCHITECTURES, ALGORITHMS, AND APPLICATIONS VI, 2002, 4731 : 188 - 195
  • [3] Robust thermal infrared tracking via an adaptively multi-feature fusion model
    Di Yuan
    Xiu Shu
    Qiao Liu
    Xinming Zhang
    Zhenyu He
    Neural Computing and Applications, 2023, 35 : 3423 - 3434
  • [4] Triplet Network with Multi-level Feature Fusion for Object Tracking
    Cao, Yang
    Wan, Bo
    Wang, Quan
    Cheng, Fei
    2020 JOINT 9TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2020 4TH INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2020,
  • [5] Robust thermal infrared tracking via an adaptively multi-feature fusion model
    Yuan, Di
    Shu, Xiu
    Liu, Qiao
    Zhang, Xinming
    He, Zhenyu
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (04): : 3423 - 3434
  • [6] Joint Sparse Representation and Robust Feature-Level Fusion for Multi-Cue Visual Tracking
    Lan, Xiangyuan
    Ma, Andy J.
    Yuen, Pong C.
    Chellappa, Rama
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) : 5826 - 5841
  • [7] Multi-modal multi-task feature fusion for RGBT tracking
    Cai, Yujue
    Sui, Xiubao
    Gu, Guohua
    INFORMATION FUSION, 2023, 97
  • [8] Robust multi-level video representation using mean shift analysis
    Gao, H
    Yu, XD
    Wang, L
    Xue, P
    Tian, Q
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 627 - 630
  • [9] Learning modality feature fusion via transformer for RGBT-tracking
    Cai, Yujue
    Sui, Xiubao
    Gu, Guohua
    Chen, Qian
    INFRARED PHYSICS & TECHNOLOGY, 2023, 133
  • [10] RGBT Tracking via Progressive Fusion Transformer With Dynamically Guided Learning
    Zhu, Yabin
    Li, Chenglong
    Wang, Xiao
    Tang, Jin
    Huang, Zhixiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8722 - 8735