Exploiting spatial and temporal context for online tracking with improved transformer

被引:0
|
作者
Zhang, Jianwei [1 ]
Wang, Jingchao [1 ]
Zhang, Huanlong [2 ]
Miao, Mengen [1 ]
Zhang, Jie [2 ]
Wu, Di [3 ]
机构
[1] Zhengzhou Univ Light Ind, Coll Software Engn, Zhengzhou 475000, Peoples R China
[2] Zhengzhou Univ Light Ind, Coll Elect & Informat Engn, Zhengzhou 475000, Peoples R China
[3] Yellow River Engn Consulting Co Ltd, Zhengzhou 450003, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual tracking; Classification and regression network; Spatial and temporal context; Transformer; VISUAL TRACKING;
D O I
10.1016/j.imavis.2023.104672
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
At present, the transformer is becoming more and more popular in computer vision tasks due to its ability to capture long-range dependencies via self-attention. In this paper, we propose a transformer-based classification regression network TrCAR utilizing the transformer to exploit deeper spatial and temporal context. Different from the classic architecture of the transformer, we introduce convolution operation into the transformer and change the calculation of features to make it suitable for the tracking task. After that, the improved transformer encoder is introduced into the regression branch of TrCAR and combined with the feature pyramid to complete multi-layer feature fusion, which is conducive to obtaining a high-quality target representation. To further enable the target model to adapt to the change of the target appearance, we bring the gradient descent to the regression branch so that it can be updated online to produce a more precise bounding box. Meanwhile, the new transformer is integrated into the classification branch of TrCAR, which as much as possible extracts the essential feature of the target across historical frames via the global computing capability, and uses it to emphasize the target position of the current frame via cross-attention. Which helps the classifier to more easily identify the correct target. Experimental results on OTB, LaSOT, VOT2018, NFS, GOT-10k, and TrackingNet benchmarks show that our TrCAR achieves comparable performance to the popular trackers.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] AMST2: aggregated multi-level spatial and temporal context-based transformer for robust aerial tracking
    Park, Hasil
    Lee, Injae
    Jeong, Dasol
    Paik, Joonki
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [22] AMST2: aggregated multi-level spatial and temporal context-based transformer for robust aerial tracking
    Hasil Park
    Injae Lee
    Dasol Jeong
    Joonki Paik
    Scientific Reports, 13
  • [23] An improved spatial–temporal regularization method for visual object tracking
    Muhammad Umar Hayat
    Ahmad Ali
    Baber Khan
    Khizer Mehmood
    Khitab Ullah
    Muhammad Amir
    Signal, Image and Video Processing, 2024, 18 : 2065 - 2077
  • [24] Spatial-temporal graph Transformer for object tracking against noise interference
    Li, Ning
    Sang, Haiwei
    Zheng, Jiamin
    Ma, Huawei
    Wang, Xiaoying
    Xiao, Fu'an
    INFORMATION SCIENCES, 2024, 678
  • [25] Online spatial-temporal data fusion for robust adaptive tracking
    Chen, Jixu
    Ji, Qiang
    2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 3326 - +
  • [26] Target-Aware Tracking With Spatial-Temporal Context Attention
    He, Kai-Jie
    Zhang, Can-Long
    Xie, Sheng
    Li, Zhi-Xin
    Wang, Zhi-Wen
    Qin, Rui-Guo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7176 - 7189
  • [27] Action unit detection by exploiting spatial-temporal and label-wise attention with transformer
    Wang, Lingfeng
    Qi, Jin
    Cheng, Jian
    Suzuki, Kenji
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2469 - 2474
  • [28] Long-term correlation tracking via spatial–temporal context
    Zhi Chen
    Peizhong Liu
    Yongzhao Du
    Yanmin Luo
    Jing-Ming Guo
    The Visual Computer, 2020, 36 : 425 - 442
  • [29] An object tracking algorithm based on optical flow and temporal–spatial context
    Yongliang Ma
    Cluster Computing, 2019, 22 : 5739 - 5747
  • [30] Tracking Algorithm of Improved Spatio-Temporal Context with Particle Filter
    Wen, Wu
    Wu, Lizhi
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 1549 - 1553