SPPT: Siamese Pyramid Pooling Transformer for Visual Object Tracking

被引:2
|
作者
Fang, Yang [1 ]
Xie, Bailian [1 ]
Jiang, Bingbing [2 ]
Ke, Xuhui [1 ]
Li, Yan [3 ]
机构
[1] Chongqing Univ Posts & Telecommun, Key Lab Data Engn & Visual Comp, Chongqing, Peoples R China
[2] Hangzhou Normal Univ, Sch Informat Sci & Technol, Hangzhou, Peoples R China
[3] Inha Univ, Dept Elect & Comp Engn, Incheon 22212, South Korea
基金
中国国家自然科学基金;
关键词
Visual Transformer Tracking; Pyramid Pooling Attention; Feature Extraction and Correlation; Enhanced; Correlation Block;
D O I
10.22967/HCIS.2023.13.059
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, visual transformer-based tracking has achieved significant success owing to its effective attention modeling strategies and global context feature extraction. However, most transformer trackers are based on the canonical Siamese and correlation-based tracking paradigm, which comprises three stages: feature extraction, feature fusion, and similarity function learning. This paradigm is speculated to weaken the cross-correlation between the template and search features while increasing the computational cost of the tracking model. Hence, we propose a Siamese pyramid pooling transformer (SPPT) to implement a one-stream end-to-end visual object tracking framework with two newly proposed modules: an iterative pooling attention-based feature extraction and correlation (P-FEC) module and an iterative enhanced correlation block (ECB). The P-FEC module can simultaneously perform feature extraction and correlation, whereas the ECB can enhance feature integration and target-aware feature embedding learning. The SPPT has a much shorter attention sequence length, fewer parameters, and fewer floating-point operations per second (FLOPs) than existing transformer-based trackers. Extensive experiments on the LaSOT, TrackingNet, and GOT-10k benchmarks demonstrate that our proposed SPPT tracker achieves state-of-the-art tracking performance in terms of precision and success scores, as compared with most convolutional neural network-based and transformer-based trackers.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] PPTtrack: Pyramid pooling based Transformer backbone for visual tracking
    Wang, Jun
    Yang, Shuai
    Wang, Yuanyun
    Yang, Guang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [2] SIAMESE FEATURE PYRAMID NETWORK FOR VISUAL TRACKING
    Chang, Shuo
    Zhang, Fan
    Huang, Sai
    Yao, Yuanyuan
    Zhao, Xiaotong
    Feng, Zhiyong
    2019 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS IN CHINA (ICCC WORKSHOPS), 2019, : 164 - 168
  • [3] Siamese Visual Object Tracking: A Survey
    Ondrasovic, Milan
    Tarabek, Peter
    IEEE ACCESS, 2021, 9 : 110149 - 110172
  • [4] Multiple Object Tracking via Feature Pyramid Siamese Networks
    Lee, Sangyun
    Kim, Euntai
    IEEE ACCESS, 2019, 7 : 8181 - 8194
  • [5] Siamese network with transformer and saliency encoder for object tracking
    Lei Liu
    Guangqian Kong
    Xun Duan
    Huiyun Long
    Yun Wu
    Applied Intelligence, 2023, 53 : 2265 - 2279
  • [6] Siamese network with transformer and saliency encoder for object tracking
    Liu, Lei
    Kong, Guangqian
    Duan, Xun
    Long, Huiyun
    Wu, Yun
    APPLIED INTELLIGENCE, 2023, 53 (02) : 2265 - 2279
  • [7] Siamese Feedback Network for Visual Object Tracking
    Gwon M.-G.
    Kim J.
    Um G.-M.
    Lee H.
    Seo J.
    Lim S.Y.
    Yang S.-J.
    Kim W.
    IEIE Transactions on Smart Processing and Computing, 2022, 11 (01): : 24 - 33
  • [8] Online Siamese Network for Visual Object Tracking
    Chang, Shuo
    Li, Wei
    Zhang, Yifan
    Feng, Zhiyong
    SENSORS, 2019, 19 (08)
  • [9] Siamese Transformer Pyramid Networks for Real-Time UAV Tracking
    Xing, Daitao
    Evangeliou, Nikolaos
    Tsoukalas, Athanasios
    Tzes, Anthony
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 1898 - 1907
  • [10] Learning Dynamic Siamese Network for Visual Object Tracking
    Guo, Qing
    Feng, Wei
    Zhou, Ce
    Huang, Rui
    Wan, Liang
    Wang, Song
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1781 - 1789