An efficient object tracking based on multi-head cross-attention transformer

被引:0
|
作者
Dai, Jiahai [1 ]
Li, Huimin [1 ]
Jiang, Shan [1 ]
Yang, Hongwei [1 ]
机构
[1] Changchun Univ Sci & Technol, Coll Comp Sci & Technol, 7089 Weixing Rd, Changchun, Peoples R China
关键词
cross-attention; multi-head; object tracking; transformer;
D O I
10.1111/exsy.13650
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object tracking is an essential component of computer vision and plays a significant role in various practical applications. Recently, transformer-based trackers have become the predominant method for tracking due to their robustness and efficiency. However, existing transformer-based trackers typically focus solely on the template features, neglecting the interactions between the search features and the template features during the tracking process. To address this issue, this article introduces a multi-head cross-attention transformer for visual tracking (MCTT), which effectively enhance the interaction between the template branch and the search branch, enabling the tracker to prioritize discriminative feature. Additionally, an auxiliary segmentation mask head has been designed to produce a pixel-level feature representation, enhancing and tracking accuracy by predicting a set of binary masks. Comprehensive experiments have been performed on benchmark datasets, such as LaSOT, GOT-10k, UAV123 and TrackingNet using various advanced methods, demonstrating that our approach achieves promising tracking performance. MCTT achieves an AO score of 72.8 on the GOT-10k.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Siamese Network Based on MLP and Multi-head Cross Attention for Visual Object Tracking
    Li, Piaoyang
    Lan, Shiyong
    Sun, Shipeng
    Wang, Wenwu
    Gao, Yongyang
    Yang, Yongyu
    Yu, Guangyu
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PART X, 2023, 14263 : 420 - 431
  • [2] HemoFuse: multi-feature fusion based on multi-head cross-attention for identification of hemolytic peptides
    Zhao, Ya
    Zhang, Shengli
    Liang, Yunyun
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [3] SCATT: Transformer tracking with symmetric cross-attention
    Zhang, Jianming
    Chen, Wentao
    Dai, Jiangxin
    Zhang, Jin
    APPLIED INTELLIGENCE, 2024, 54 (08) : 6069 - 6084
  • [4] SMSTracker: A Self-Calibration Multi-Head Self-Attention Transformer for Visual Object Tracking
    Wang, Zhongyang
    Zhu, Hu
    Liu, Feng
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (01): : 605 - 623
  • [5] Deblurring transformer tracking with conditional cross-attention
    Sun, Fuming
    Zhao, Tingting
    Zhu, Bing
    Jia, Xu
    Wang, Fasheng
    MULTIMEDIA SYSTEMS, 2023, 29 (03) : 1131 - 1144
  • [6] Deblurring transformer tracking with conditional cross-attention
    Fuming Sun
    Tingting Zhao
    Bing Zhu
    Xu Jia
    Fasheng Wang
    Multimedia Systems, 2023, 29 : 1131 - 1144
  • [7] Multi-view Cross-Attention Network for Hyperspectral Object Tracking
    Zhu, Minghao
    Wang, Chongchong
    Wang, Heng
    Yuan, Shanshan
    Song, Lin
    Ma, Zongfang
    PATTERN RECOGNITION AND COMPUTER VISION, PT XIII, PRCV 2024, 2025, 15043 : 32 - 46
  • [8] Diversifying Multi-Head Attention in the Transformer Model
    Ampazis, Nicholas
    Sakketou, Flora
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (04): : 2618 - 2638
  • [9] Combining Graph Contrastive Embedding and Multi-head Cross-Attention Transfer for Cross-Domain Recommendation
    Shuo Xiao
    Dongqing Zhu
    Chaogang Tang
    Zhenzhen Huang
    Data Science and Engineering, 2023, 8 : 247 - 262
  • [10] Combining Graph Contrastive Embedding and Multi-head Cross-Attention Transfer for Cross-Domain Recommendation
    Xiao, Shuo
    Zhu, Dongqing
    Tang, Chaogang
    Huang, Zhenzhen
    DATA SCIENCE AND ENGINEERING, 2023, 8 (03) : 247 - 262