SPPT: Siamese Pyramid Pooling Transformer for Visual Object Tracking

被引:2
|
作者
Fang, Yang [1 ]
Xie, Bailian [1 ]
Jiang, Bingbing [2 ]
Ke, Xuhui [1 ]
Li, Yan [3 ]
机构
[1] Chongqing Univ Posts & Telecommun, Key Lab Data Engn & Visual Comp, Chongqing, Peoples R China
[2] Hangzhou Normal Univ, Sch Informat Sci & Technol, Hangzhou, Peoples R China
[3] Inha Univ, Dept Elect & Comp Engn, Incheon 22212, South Korea
基金
中国国家自然科学基金;
关键词
Visual Transformer Tracking; Pyramid Pooling Attention; Feature Extraction and Correlation; Enhanced; Correlation Block;
D O I
10.22967/HCIS.2023.13.059
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, visual transformer-based tracking has achieved significant success owing to its effective attention modeling strategies and global context feature extraction. However, most transformer trackers are based on the canonical Siamese and correlation-based tracking paradigm, which comprises three stages: feature extraction, feature fusion, and similarity function learning. This paradigm is speculated to weaken the cross-correlation between the template and search features while increasing the computational cost of the tracking model. Hence, we propose a Siamese pyramid pooling transformer (SPPT) to implement a one-stream end-to-end visual object tracking framework with two newly proposed modules: an iterative pooling attention-based feature extraction and correlation (P-FEC) module and an iterative enhanced correlation block (ECB). The P-FEC module can simultaneously perform feature extraction and correlation, whereas the ECB can enhance feature integration and target-aware feature embedding learning. The SPPT has a much shorter attention sequence length, fewer parameters, and fewer floating-point operations per second (FLOPs) than existing transformer-based trackers. Extensive experiments on the LaSOT, TrackingNet, and GOT-10k benchmarks demonstrate that our proposed SPPT tracker achieves state-of-the-art tracking performance in terms of precision and success scores, as compared with most convolutional neural network-based and transformer-based trackers.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Multiple Context Features in Siamese Networks for Visual Object Tracking
    Morimitsu, Henrique
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT I, 2019, 11129 : 116 - 131
  • [22] Siamese Attentional Cascade Keypoints Network for Visual Object Tracking
    Wang, Ershen
    Wang, Donglei
    Huang, Yufeng
    Tong, Gang
    Xu, Song
    Pang, Tao
    IEEE ACCESS, 2021, 9 : 7243 - 7254
  • [23] Siamese Graph Attention Networks for robust visual object tracking
    Lu, Junjie
    Li, Shengyang
    Guo, Weilong
    Zhao, Manqi
    Yang, Jian
    Liu, Yunfei
    Zhou, Zhuang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 229
  • [24] SiamCross: Siamese Cross Object-Aware Networks for Visual Object Tracking
    Huang W.-H.
    Feng Y.
    Qiang B.-H.
    Pei Y.-X.
    Luo Y.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (10): : 2151 - 2166
  • [25] Deep Feature Based Siamese Network for Visual Object Tracking
    Lim, Su-Chang
    Huh, Jun-Ho
    Kim, Jong-Chan
    ENERGIES, 2022, 15 (17)
  • [26] Distractor-Aware Siamese Networks for Visual Object Tracking
    Zhu, Zheng
    Wang, Qiang
    Li, Bo
    Wu, Wei
    Yan, Junjie
    Hu, Weiming
    COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 103 - 119
  • [27] Robust Template Adjustment Siamese Network for Object Visual Tracking
    Tang, Chuanming
    Qin, Peng
    Zhang, Jianlin
    SENSORS, 2021, 21 (04) : 1 - 17
  • [28] Siamese transformer RGBT tracking
    Wang, Futian
    Wang, Wenqi
    Liu, Lei
    Li, Chenglong
    Tang, Jing
    APPLIED INTELLIGENCE, 2023, 53 (21) : 24709 - 24723
  • [29] Siamese transformer RGBT tracking
    Futian Wang
    Wenqi Wang
    Lei Liu
    Chenglong Li
    Jing Tang
    Applied Intelligence, 2023, 53 : 24709 - 24723
  • [30] Siamese pyramid residual module with local binary convolution network for single object tracking
    Nie, Yan
    Zhang, Taiping
    Zhao, Linchang
    Ma, Xindi
    Tang, Yuanyan
    Liu, Xiaoyu
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2021, 19 (06)