Efficient transformer tracking with adaptive attention

被引:0
|
作者
Xiao, Dingkun [1 ]
Wei, Zhenzhong [1 ]
Zhang, Guangjun [1 ]
机构
[1] Beihang Univ, Sch Instrumentat & Optoelect Engn, Key Lab Precis Optomechatron Technol, Minist Educ, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
computer vision; convolution; convolutional neural nets; object tracking; target tracking; tracking;
D O I
10.1049/cvi2.12315
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, several trackers utilising Transformer architecture have shown significant performance improvement. However, the high computational cost of multi-head attention, a core component in the Transformer, has limited real-time running speed, which is crucial for tracking tasks. Additionally, the global mechanism of multi-head attention makes it susceptible to distractors with similar semantic information to the target. To address these issues, the authors propose a novel adaptive attention that enhances features through the spatial sparse attention mechanism with less than 1/4 of the computational complexity of multi-head attention. Our adaptive attention sets a perception range around each element in the feature map based on the target scale in the previous tracking result and adaptively searches for the information of interest. This allows the module to focus on the target region rather than background distractors. Based on adaptive attention, the authors build an efficient transformer tracking framework. It can perform deep interaction between search and template features to activate target information and aggregate multi-level interaction features to enhance the representation ability. The evaluation results on seven benchmarks show that the authors' tracker achieves outstanding performance with a speed of 43 fps and significant advantages in hard circumstances.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Efficient image analysis with triple attention vision transformer
    Li, Gehui
    Zhao, Tongtong
    PATTERN RECOGNITION, 2024, 150
  • [42] Chain and Causal Attention for Efficient Entity Tracking
    Fagnou, Erwan
    Caillon, Paul
    Delattre, Blaise
    Allauzen, Alexandre
    arXiv,
  • [43] EANTrack: An Efficient Attention Network for Visual Tracking
    Gu, Fengwei
    Lu, Jun
    Cai, Chengtao
    Zhu, Qidan
    Ju, Zhaojie
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (04) : 5911 - 5928
  • [44] MixFormerV2: Efficient Fully Transformer Tracking
    Cui, Yutao
    Song, Tianhui
    Wu, Gangshan
    Wang, Limin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [45] Siamese hierarchical feature fusion transformer for efficient tracking
    Dai, Jiahai
    Fu, Yunhao
    Wang, Songxin
    Chang, Yuchun
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [46] Voice Activity Detection Optimized by Adaptive Attention Span Transformer
    Mu, Wenpeng
    Liu, Bingshan
    IEEE ACCESS, 2023, 11 : 31238 - 31243
  • [47] ASAFormer: Visual tracking with convolutional vision transformer and asymmetric selective attention
    Gong, Xiaomei
    Zhang, Yi
    Hu, Shu
    KNOWLEDGE-BASED SYSTEMS, 2024, 291
  • [48] ASAFormer: Visual tracking with convolutional vision transformer and asymmetric selective attention
    Gong, Xiaomei
    Zhang, Yi
    Hu, Shu
    Knowledge-Based Systems, 2024, 291
  • [49] Self-Attention-Based Transformer for Nonlinear Maneuvering Target Tracking
    Shen, Lu
    Su, Hongtao
    Li, Ze
    Jia, Congyue
    Yang, Ruixing
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [50] Siamese Attention Networks with Adaptive Templates for Visual Tracking
    Zhang, Bo
    Liang, Zhixue
    Dong, Wenyong
    MOBILE INFORMATION SYSTEMS, 2022, 2022