Propagating prior information with transformer for robust visual object tracking

被引:0
|
作者
Wu, Yue [1 ]
Cai, Chengtao [1 ,2 ]
Yeo, Chai Kiat [3 ]
机构
[1] Harbin Engn Univ, Sch Intelligent Sci & Engn, Harbin 150001, Peoples R China
[2] Harbin Engn Univ, Key Lab Intelligent Technol & Applicat Marine Equi, Minist Educ, Harbin 150001, Peoples R China
[3] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
关键词
Visual object tracking; Siamese network; Transformer; Prior information; VIDEO;
D O I
10.1007/s00530-024-01423-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the domain of visual object tracking has witnessed considerable advancements with the advent of deep learning methodologies. Siamese-based trackers have been pivotal, establishing a new architecture with a weight-shared backbone. With the inclusion of the transformer, attention mechanism has been exploited to enhance the feature discriminability across successive frames. However, the limited adaptability of many existing trackers to the different tracking scenarios has led to inaccurate target localization. To effectively solve this issue, in this paper, we have integrated a siamese network with the transformer, where the former utilizes ResNet50 as the backbone network to extract the target features, while the latter consists of an encoder and a decoder, where the encoder can effectively utilize global contextual information to obtain the discriminative features. Simultaneously, we employ the decoder to propagate prior information related to the target, which enables the tracker to successfully locate the target in a variety of environments, enhancing the stability and robustness of the tracker. Extensive experiments on four major public datasets, OTB100, UAV123, GOT10k and LaSOText demonstrate the effectiveness of the proposed method. Its performance surpasses many state-of-the-art trackers. Additionally, the proposed tracker can achieve a tracking speed of 60 fps, meeting the requirements for real-time tracking.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Transformer-Based Visual Object Tracking with Global Feature Enhancement
    Wang, Shuai
    Fang, Genwen
    Liu, Lei
    Wang, Jun
    Zhu, Kongfen
    Melo, Silas N.
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [42] Siamese Graph Attention Networks for robust visual object tracking
    Lu, Junjie
    Li, Shengyang
    Guo, Weilong
    Zhao, Manqi
    Yang, Jian
    Liu, Yunfei
    Zhou, Zhuang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 229
  • [43] Biogeography based optimization method for robust visual object tracking
    Daneshyar, Seyed Abbas
    Charkari, Nasrollah Moghadam
    APPLIED SOFT COMPUTING, 2022, 122
  • [44] Robust Template Adjustment Siamese Network for Object Visual Tracking
    Tang, Chuanming
    Qin, Peng
    Zhang, Jianlin
    SENSORS, 2021, 21 (04) : 1 - 17
  • [45] Trajectory Guided Robust Visual Object Tracking With Selective Remedy
    Wang, Han
    Liu, Jing
    Su, Yuting
    Yang, Xiaokang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) : 3425 - 3440
  • [46] Robust Visual Object Tracking with Top-down Reasoning
    Zhang, Mengdan
    Feng, Jiashi
    Hu, Weiming
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 226 - 234
  • [47] ROBUST TRAJECTORY TRACKING WITH OPTIMAL VISUAL SERVOING ON A DEFORMABLE OBJECT
    Derrar, Yasser
    Saidi, Farah
    Malti, Abed
    International Journal of Robotics and Automation, 2023, 38 (03): : 180 - 193
  • [48] Improved Hierarchical Convolutional Features for Robust Visual Object Tracking
    Sun, Jinping
    COMPLEXITY, 2021, 2021
  • [49] Deep Spatial and Temporal Network for Robust Visual Object Tracking
    Teng, Zhu
    Xing, Junliang
    Wang, Qiang
    Zhang, Baopeng
    Fan, Jianping
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 1762 - 1775
  • [50] Deterministic Method of Visual Servoing: robust object tracking by drone
    Ouchatti Zakaria
    Bensaid Alaa
    Moutaouakkil Fouad
    Medromi Hicham
    2016 13TH INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS, IMAGING AND VISUALIZATION (CGIV), 2016, : 414 - 422