Sparse Transformer-Based Sequence Generation for Visual Object Tracking

被引:0
|
作者
Tian, Dan [1 ]
Liu, Dong-Xin [2 ]
Wang, Xiao [2 ]
Hao, Ying [2 ]
机构
[1] Shenyang Univ, Sch Intelligent Syst Sci & Engn, Shenyang 110044, Liaoning, Peoples R China
[2] Shenyang Univ, Sch Informat Engn, Shenyang 110044, Liaoning, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Transformers; Visualization; Target tracking; Decoding; Feature extraction; Attention mechanisms; Object tracking; Training; Interference; Attention mechanism; sequence generation; sparse attention; visual object tracking; vision transformer;
D O I
10.1109/ACCESS.2024.3482468
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In visual object tracking, attention mechanisms can flexibly and efficiently handle complex dependencies and global information, which improves tracking accuracy. However, when dealing with scenarios that contain a large amount of background information or other complex information, its global attention ability can dilute the weight of important information, allocate unnecessary attention to background information, and thus reduce tracking performance. To relieve this problem, this paper proposes a visual object tracking framework based on a sparse transformer. Our tracking framework is a simple encoder-decoder structure that realizes the prediction of the target in an autoregressive manner, eliminating the additional head network and simplifying the tracking architecture. Furthermore, we introduce a Sparse Attention Mechanism (SMA) in the cross-attention layer of the decoder. Unlike traditional attention mechanisms, SMA focuses only on the top K pixel values that are most relevant to the current pixel when calculating attention weights. This allows the model to focus more on key information and improve foreground and background discrimination, resulting in more accurate and robust tracking. We conduct tests on six tracking benchmarks, and the experimental results prove the effectiveness of our method.
引用
收藏
页码:154418 / 154425
页数:8
相关论文
共 50 条
  • [1] Transformer-Based Visual Object Tracking with Global Feature Enhancement
    Wang, Shuai
    Fang, Genwen
    Liu, Lei
    Wang, Jun
    Zhu, Kongfen
    Melo, Silas N.
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [2] A Transformer-Based Network for Hyperspectral Object Tracking
    Gao, Long
    Chen, Langkun
    Liu, Pan
    Jiang, Yan
    Xie, Weiying
    Li, Yunsong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [3] Query-Based Object Visual Tracking with Parallel Sequence Generation
    Liu, Chang
    Zhang, Bin
    Bo, Chunjuan
    Wang, Dong
    SENSORS, 2024, 24 (15)
  • [4] Transformer-based assignment decision network for multiple object tracking
    Psalta, Athena
    Tsironis, Vasileios
    Karantzalos, Konstantinos
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 241
  • [5] A transformer-based lightweight method for multiple-object tracking
    Wan, Qin
    Ge, Zhu
    Yang, Yang
    Shen, Xuejun
    Zhong, Hang
    Zhang, Hui
    Wang, Yaonan
    Wu, Di
    IET IMAGE PROCESSING, 2024, 18 (09) : 2329 - 2345
  • [6] UniTracker: transformer-based CrossUnihead for multi-object tracking
    Wu, Fan
    Zhang, Yifeng
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (04)
  • [7] A Transformer-based network with adaptive spatial prior for visual tracking
    Cheng, Feng
    Peng, Gaoliang
    Li, Junbao
    Zhao, Benqi
    Pan, Jeng-Shyang
    Li, Hang
    NEUROCOMPUTING, 2025, 614
  • [8] Transformer-based Sparse Encoder and Answer Decoder for Visual Question Answering
    Peng, Longkun
    An, Gaoyun
    Ruan, Qiuqi
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 120 - 123
  • [9] Transformer-Based Multi-object Tracking in Unmanned Aerial Vehicles
    Li, Jiaxin
    Li, Hongjun
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 347 - 358
  • [10] MotionFormer: An Improved Transformer-Based Architecture for Multi-object Tracking
    Agrawal, Harshit
    Halder, Agrya
    Chattopadhyay, Pratik
    COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III, 2024, 2011 : 212 - 224