Multiple templates transformer for visual object tracking

被引:7
|
作者
Pang, Haibo [1 ]
Su, Jie [1 ]
Ma, Rongqi [1 ]
Li, Tingting [1 ]
Liu, Chengming [1 ]
机构
[1] Zhengzhou Univ, Sch Cyber Sci & Engn, Zhengzhou 450002, Peoples R China
关键词
Single object tracking; Siamese tracker; Multiple templates; Transformer;
D O I
10.1016/j.knosys.2023.111025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Matching the similarity between a template and search region is crucial in Siamese trackers. However, due to the limited information provided by a fixed template, existing trackers are not robust enough in complex scenarios, such as severe deformation, background clutters, out-of-view, illumination variation, low resolution, scale variation, fast motion, and full occlusion. Therefore, it is essential to use an informative template. Additionally, since the Transformer has superior model capability compared to traditional cross-correlation in tracking, some Siamese trackers have integrated Transformers and achieved exceptional performance. In this paper, we present a novel tracking architecture with Multiple Templates Transformer (MTT) to address the above issues. By utilizing multiple templates, the proposed method can grasp more contextual information and historical changes about the target, which can be leveraged to enhance the response in the search region using an encoder-decoder framework. We also explore different mechanisms to fuse templates effectively to achieve higher accuracy. We evaluate MTT in several famous benchmarks such as GOT-10k, TrackingNet, UAV123, OTB2015, VOT2018, and LaSOT. Extensive experimental results indicate that our tracker is capable of achieving better robustness in the face of different challenges while maintaining a considerable real-time speed.
引用
收藏
页数:13
相关论文
共 50 条
  • [11] Visual Attention Is Required for Multiple Object Tracking
    Tran, Annie
    Hoffman, James E.
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 2016, 42 (12) : 2103 - 2114
  • [12] Visual Learning in Multiple-Object Tracking
    Makovski, Tal
    Vazquez, Gustavo A.
    Jiang, Yuhong V.
    PLOS ONE, 2008, 3 (05):
  • [13] Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking
    Xu, Tianyang
    Pan, Yifan
    Feng, Zhenhua
    Zhu, Xuefeng
    Cheng, Chunyang
    Wu, Xiao-Jun
    Kittler, Josef
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 6021 - 6038
  • [14] Foreground-Background Distribution Modeling Transformer for Visual Object Tracking
    Yang, Dawei
    He, Jianfeng
    Ma, Yinchao
    Yu, Qianjin
    Zhang, Tianzhu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10083 - 10093
  • [15] Sparse Transformer-Based Sequence Generation for Visual Object Tracking
    Tian, Dan
    Liu, Dong-Xin
    Wang, Xiao
    Hao, Ying
    IEEE ACCESS, 2024, 12 : 154418 - 154425
  • [16] FETrack: Feature-Enhanced Transformer Network for Visual Object Tracking
    Liu, Hang
    Huang, Detian
    Lin, Mingxin
    APPLIED SCIENCES-BASEL, 2024, 14 (22):
  • [17] Memory Prompt for Spatio-Temporal Transformer Visual Object Tracking
    Xu T.
    Wu X.
    Zhu X.
    Kittler J.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (08): : 1 - 6
  • [18] Transformer-Based Visual Object Tracking with Global Feature Enhancement
    Wang, Shuai
    Fang, Genwen
    Liu, Lei
    Wang, Jun
    Zhu, Kongfen
    Melo, Silas N.
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [19] The effect of visual distinctiveness on multiple object tracking performance
    Howe, Piers D. L.
    Holcombe, Alex O.
    FRONTIERS IN PSYCHOLOGY, 2012, 3
  • [20] Online learning of multiple detectors for visual object tracking
    Quan, Wei
    Chen, Jin-Xiong
    Yu, Nan-Yang
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2014, 42 (05): : 875 - 882