SPPT: Siamese Pyramid Pooling Transformer for Visual Object Tracking

被引:2
|
作者
Fang, Yang [1 ]
Xie, Bailian [1 ]
Jiang, Bingbing [2 ]
Ke, Xuhui [1 ]
Li, Yan [3 ]
机构
[1] Chongqing Univ Posts & Telecommun, Key Lab Data Engn & Visual Comp, Chongqing, Peoples R China
[2] Hangzhou Normal Univ, Sch Informat Sci & Technol, Hangzhou, Peoples R China
[3] Inha Univ, Dept Elect & Comp Engn, Incheon 22212, South Korea
基金
中国国家自然科学基金;
关键词
Visual Transformer Tracking; Pyramid Pooling Attention; Feature Extraction and Correlation; Enhanced; Correlation Block;
D O I
10.22967/HCIS.2023.13.059
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, visual transformer-based tracking has achieved significant success owing to its effective attention modeling strategies and global context feature extraction. However, most transformer trackers are based on the canonical Siamese and correlation-based tracking paradigm, which comprises three stages: feature extraction, feature fusion, and similarity function learning. This paradigm is speculated to weaken the cross-correlation between the template and search features while increasing the computational cost of the tracking model. Hence, we propose a Siamese pyramid pooling transformer (SPPT) to implement a one-stream end-to-end visual object tracking framework with two newly proposed modules: an iterative pooling attention-based feature extraction and correlation (P-FEC) module and an iterative enhanced correlation block (ECB). The P-FEC module can simultaneously perform feature extraction and correlation, whereas the ECB can enhance feature integration and target-aware feature embedding learning. The SPPT has a much shorter attention sequence length, fewer parameters, and fewer floating-point operations per second (FLOPs) than existing transformer-based trackers. Extensive experiments on the LaSOT, TrackingNet, and GOT-10k benchmarks demonstrate that our proposed SPPT tracker achieves state-of-the-art tracking performance in terms of precision and success scores, as compared with most convolutional neural network-based and transformer-based trackers.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] SiamRAAN: Siamese Residual Attentional Aggregation Network for Visual Object Tracking
    Xin, Zhiyi
    Yu, Junyang
    He, Xin
    Song, Yalin
    Li, Han
    NEURAL PROCESSING LETTERS, 2024, 56 (02)
  • [42] R-SiamNet: ROI-Align Pooling Baesd Siamese Network for Object Tracking
    Su, LiHui
    Wang, Yaowei
    Tian, Yonghong
    THIRD INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2020), 2020, : 19 - 24
  • [43] Hunt-inspired Transformer for visual object tracking
    Zhang, Zhibin
    Xue, Wanli
    Zhou, Yuxi
    Zhang, Kaihua
    Chen, Shengyong
    PATTERN RECOGNITION, 2024, 156
  • [44] Visual tracking via dynamic weighting with pyramid-redetection based Siamese networks
    Cao, Yi
    Ji, Hongbing
    Zhang, Wenbo
    Xue, Fei
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 65
  • [45] Mutual Learning and Feature Fusion Siamese Networks for Visual Object Tracking
    Jiang, Min
    Zhao, Yuyao
    Kong, Jun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (08) : 3154 - 3167
  • [46] Learning Geometry Information of Target for Visual Object Tracking with Siamese Networks
    Chen, Hang
    Zhang, Weiguo
    Yan, Danghui
    SENSORS, 2021, 21 (23)
  • [47] DomainSiam: Domain-Aware Siamese Network for Visual Object Tracking
    Abdelpakey, Mohamed H.
    Shehata, Mohamed S.
    ADVANCES IN VISUAL COMPUTING, ISVC 2019, PT I, 2020, 11844 : 45 - 58
  • [48] A novel Siamese Attention Network for visual object tracking of autonomous vehicles
    Chen, Jia
    Ai, Yibo
    Qian, Yuhan
    Zhang, Weidong
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART D-JOURNAL OF AUTOMOBILE ENGINEERING, 2021, 235 (10-11) : 2764 - 2775
  • [49] Learning saliency-awareness Siamese network for visual object tracking
    Yang, Peng
    Wang, Qinghui
    Dou, Jie
    Dou, Lei
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 103
  • [50] Target-Cognisant Siamese Network for Robust Visual Object Tracking
    Jiang, Yingjie
    Song, Xiaoning
    Xu, Tianyang
    Feng, Zhenhua
    Wu, Xiaojun
    Kittler, Josef
    Pattern Recognition Letters, 2022, 163 : 129 - 135