ATFTrans: attention-weighted token fusion transformer for robust and efficient object tracking

被引:2
|
作者
Xu, Liang [1 ]
Wang, Liejun [1 ]
Guo, Zhiqing [1 ]
机构
[1] Xinjiang Univ, Sch Comp Sci & Technol, Urumqi 830000, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2024年 / 36卷 / 13期
基金
中国国家自然科学基金;
关键词
Fully transformer-based tracker; Token fusion; Information loss; Efficient inference;
D O I
10.1007/s00521-024-09444-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, fully transformer-based trackers have achieved impressive tracking results, but this also brings a great deal of computational complexity. Some researchers have applied token pruning techniques to fully transformer-based trackers to diminish the computational complexity, but this leads to missing contextual information that is important for the regression task in the tracker. In response to the above issue, this paper proposes a token fusion method that speeds up inference while avoiding information loss and thus improving the robustness of the tracker. Specifically, the input of the transformer's encoder contains search tokens and exemplar tokens, and the search tokens are divided into tracking object tokens and background tokens according to the similarity between search tokens and exemplar tokens. The tokens with greater similarity to the exemplar tokens are identified as tracking object tokens, and those with smaller similarity to the exemplar tokens are identified as background tokens. The tracking object tokens contain the discriminative features of the tracking object, for the sake of making the tracker pay more attention to the tracking object tokens while reducing the computational effort. All the tracking object tokens are kept, and then, the background tokens are weighted and fused to form new background tokens according to the attention weight of the background tokens to prevent the loss of contextual information. The token fusion method presented in this paper not only provides efficient inference of the tracker but also makes the tracker more robust. Extensive experiments are carried out on popular tracking benchmark datasets to verify the validity of the token fusion method.
引用
收藏
页码:7043 / 7056
页数:14
相关论文
共 50 条
  • [41] Robust object tracking via multi-cue fusion
    Hu, Mengjie
    Liu, Zhen
    Zhang, Jingyu
    Zhang, Guangjun
    SIGNAL PROCESSING, 2017, 139 : 86 - 95
  • [42] Feature fusion for robust object tracking using fragmented particles
    Nigam, Chhabi
    Babu, R. Venkatesh
    Raja, S. Kumar
    Ramakrishnan, K. R.
    2007 FIRST ACM/IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED SMART CAMERAS, 2007, : 273 - +
  • [43] A Fusion Approach for Robust Visual Object Tracking in Crowd Scenes
    Oh, Tae-Hyun
    Joo, Kyungdon
    Kim, Junsik
    Park, Jaesik
    Kweon, In So
    2014 11TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2014, : 558 - 560
  • [44] Robust object tracking based on sparse representation and incremental weighted PCA
    Xing, Xiaofen
    Qiu, Fuhao
    Xu, Xiangmin
    Qing, Chunmei
    Wu, Yinrong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (02) : 2039 - 2057
  • [45] Robust object tracking based on sparse representation and incremental weighted PCA
    Xiaofen Xing
    Fuhao Qiu
    Xiangmin Xu
    Chunmei Qing
    Yinrong Wu
    Multimedia Tools and Applications, 2017, 76 : 2039 - 2057
  • [46] Toward Small Sample Challenge in Intelligent Fault Diagnosis: Attention-Weighted Multidepth Feature Fusion Net With Signals Augmentation
    Zhang, Tianci
    He, Shuilong
    Chen, Jinglong
    Pan, Tongyang
    Zhou, Zitong
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [47] Efficient Transformer-based 3D Object Detection with Dynamic Token Halting
    Ye, Mao
    Meyer, Gregory P.
    Chai, Yuning
    Liu, Qiang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8404 - 8416
  • [48] WFSS: weighted fusion of spectral transformer and spatial self-attention for robust hyperspectral image classification against adversarial attacks
    Lichun Tang
    Zhaoxia Yin
    Hang Su
    Wanli Lyu
    Bin Luo
    Visual Intelligence, 2 (1):
  • [49] Sinogram domain angular upsampling of sparse-view micro-CT with dense residual hierarchical transformer and attention-weighted loss
    Adisheshaa, Amogh Subbakrishna
    Vanselow, Daniel J.
    La Riviere, Patrick
    Cheng, Keith C.
    Huang, Sharon X.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2023, 242
  • [50] Structural pixel-wise target attention for robust object tracking
    Zhang, Huanlong
    Cheng, Liyun
    Zhang, Jianwei
    Huang, Wanwei
    Liu, Xiulei
    Yu, Junyang
    DIGITAL SIGNAL PROCESSING, 2021, 117