Multi-Level Fusion for Robust RGBT Tracking via Enhanced Thermal Representation

被引:1
|
作者
Tang, Zhangyong [1 ]
Xu, Tianyang [1 ]
Wu, Xiao-jun [1 ]
Kittler, Josef [2 ]
机构
[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi, Peoples R China
[2] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, England
基金
中国国家自然科学基金; 英国工程与自然科学研究理事会;
关键词
Visual object tracking; RGBT tracking; thermal enhancement; multi-modal multi-level fusion; BENCHMARK;
D O I
10.1145/3678176
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the limitations of visible (RGB) sensors in challenging scenarios, such as nighttime and foggy environments, the thermal infrared (TIR) modality draws increasing attention as an auxiliary source for robust tracking systems. Currently, the existing methods extract both the RGB and TIR (RGBT) clues in a similar approach, i.e., utilising RGB-pretrained models with or without finetuning, and then aggregate the multi-modal information through a fusion block embedded in a single level. However, the different imaging principles of RGB and TIR data raise questions about the suitability of RGB-pretrained models for thermal data. In this article, it is argued that the modality gap is overlooked, and an alternative training paradigm is proposed for TIR data to ensure consistency between the training and test data, which is achieved by optimising the TIR feature extractor with only TIR data involved. Furthermore, with the goal of making better use of the enhanced thermal representations, a multi-level fusion strategy is inspired by the observation that various fusion strategies at different levels can contribute to a better performance. Specifically, fusion modules at both the feature and decision levels are derived for a comprehensive fusion procedure while the pixel-level fusion strategy is not considered due to the misalignment of multi-modal image pairs. The effectiveness of our method is demonstrated by extensive qualitative and quantitative experiments conducted on several challenging benchmarks. Code will be released at https://github.com/Zhangyong-Tang/MELT.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Multi-level fusion exploitation
    Lindberg, FC
    Dasarathy, BV
    SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION V, 1996, 2755 : 260 - 270
  • [22] RGBT Tracking via Noise-Robust Cross-Modal Ranking
    Li, Chenglong
    Xiang, Zhiqiang
    Tang, Jin
    Luo, Bin
    Wang, Futian
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (09) : 5019 - 5031
  • [23] Multi-level Connection Enhanced Representation Learning for Script Event Prediction
    Wang, Lihong
    Yue, Juwei
    Guo, Shu
    Sheng, Jiawei
    Mao, Qianren
    Chen, Zhenyu
    Zhong, Shenghai
    Li, Chen
    PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 3524 - 3533
  • [24] A traget tracking method combining multi-level sparse representation and metric learning
    Peng, Meng
    Cai, Zi-Xing
    Chen, Bai-Fan
    Kongzhi yu Juece/Control and Decision, 2015, 30 (10): : 1791 - 1796
  • [25] RGBT Image Fusion Tracking via Sparse Trifurcate Transformer Aggregation Network
    Feng, Mingzheng
    Su, Jianbo
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 10
  • [26] Learning Deep Multi-Level Similarity for Thermal Infrared Object Tracking
    Liu, Qiao
    Li, Xin
    He, Zhenyu
    Fan, Nana
    Yuan, Di
    Wang, Hongpeng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2114 - 2126
  • [27] Dialogue State Tracking with Multi-Level Fusion of Predicted Dialogue States and Conversations
    Zhou, Jingyao
    Wu, Haipang
    Lin, Zehao
    Li, Guodun
    Zhang, Yin
    SIGDIAL 2021: 22ND ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2021), 2021, : 228 - 238
  • [28] Development of a multi-level feature fusion model for basketball player trajectory tracking
    Wang, Tao
    SYSTEMS AND SOFT COMPUTING, 2024, 6
  • [29] Multi-level Multi-task representation learning with adaptive fusion for multimodal sentiment analysis
    Chuanbo Zhu
    Min Chen
    Haomin Li
    Sheng Zhang
    Han Liang
    Chao Sun
    Yifan Liu
    Jincai Chen
    Neural Computing and Applications, 2025, 37 (3) : 1491 - 1508
  • [30] Traffic density estimation via a multi-level feature fusion network
    Hu, Ying-Xiang
    Jia, Rui-Sheng
    Li, Yong-Chao
    Zhang, Qi
    Sun, Hong-Mei
    APPLIED INTELLIGENCE, 2022, 52 (09) : 10417 - 10429