ATFTrans: attention-weighted token fusion transformer for robust and efficient object tracking

被引:2
|
作者
Xu, Liang [1 ]
Wang, Liejun [1 ]
Guo, Zhiqing [1 ]
机构
[1] Xinjiang Univ, Sch Comp Sci & Technol, Urumqi 830000, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2024年 / 36卷 / 13期
基金
中国国家自然科学基金;
关键词
Fully transformer-based tracker; Token fusion; Information loss; Efficient inference;
D O I
10.1007/s00521-024-09444-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, fully transformer-based trackers have achieved impressive tracking results, but this also brings a great deal of computational complexity. Some researchers have applied token pruning techniques to fully transformer-based trackers to diminish the computational complexity, but this leads to missing contextual information that is important for the regression task in the tracker. In response to the above issue, this paper proposes a token fusion method that speeds up inference while avoiding information loss and thus improving the robustness of the tracker. Specifically, the input of the transformer's encoder contains search tokens and exemplar tokens, and the search tokens are divided into tracking object tokens and background tokens according to the similarity between search tokens and exemplar tokens. The tokens with greater similarity to the exemplar tokens are identified as tracking object tokens, and those with smaller similarity to the exemplar tokens are identified as background tokens. The tracking object tokens contain the discriminative features of the tracking object, for the sake of making the tracker pay more attention to the tracking object tokens while reducing the computational effort. All the tracking object tokens are kept, and then, the background tokens are weighted and fused to form new background tokens according to the attention weight of the background tokens to prevent the loss of contextual information. The token fusion method presented in this paper not only provides efficient inference of the tracker but also makes the tracker more robust. Extensive experiments are carried out on popular tracking benchmark datasets to verify the validity of the token fusion method.
引用
收藏
页码:7043 / 7056
页数:14
相关论文
共 50 条
  • [21] Attention-based Weighted Fusion Network for Object Detection
    Yu, Ruixing
    Wang, Chuyin
    Tang, Yifei
    JOURNAL OF IMAGING SCIENCE AND TECHNOLOGY, 2024, 68 (06) : 1 - 18
  • [22] Learning Soft Mask Based Feature Fusion with Channel and Spatial Attention for Robust Visual Object Tracking
    Fiaz, Mustansar
    Mahmood, Arif
    Jung, Soon Ki
    SENSORS, 2020, 20 (14) : 1 - 25
  • [23] Siamese hierarchical feature fusion transformer for efficient tracking
    Dai, Jiahai
    Fu, Yunhao
    Wang, Songxin
    Chang, Yuchun
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [24] Robust object tracking with background-weighted local kernels
    Jeyakar, Jaideep
    Babu, R. Venkatesh
    Ramakrishnan, K. R.
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 112 (03) : 296 - 309
  • [25] Robust object tracking via local constrained and online weighted
    Yi Zha
    Tieyong Cao
    Hui Huang
    Zhijun Song
    Wenhui Liang
    Feibin Li
    Multimedia Tools and Applications, 2016, 75 : 6481 - 6503
  • [26] Robust object tracking via local constrained and online weighted
    Zha, Yi
    Cao, Tieyong
    Huang, Hui
    Song, Zhijun
    Liang, Wenhui
    Li, Feibin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (11) : 6481 - 6503
  • [27] Study on Particle Filter Object Tracking Based on Weighted Fusion
    Wen, Zhiqiang
    Zhu, Yanhui
    Peng, Zhaoyi
    MEMS, NANO AND SMART SYSTEMS, PTS 1-6, 2012, 403-408 : 3049 - +
  • [28] Adaptive sparse attention-based compact transformer for object tracking
    Pan, Fei
    Zhao, Lianyu
    Wang, Chenglin
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [29] Hybrid multi-attention transformer for robust video object detection
    Moorthy, Sathishkumar
    Sakthi, Sachin
    Arthanari, Sathiyamoorthi
    Jeong, Jae Hoon
    Joo, Young Hoon
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139
  • [30] Siamese Graph Attention Networks for robust visual object tracking
    Lu, Junjie
    Li, Shengyang
    Guo, Weilong
    Zhao, Manqi
    Yang, Jian
    Liu, Yunfei
    Zhou, Zhuang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 229