Exploiting spatial and temporal context for online tracking with improved transformer

被引：0

作者：

Zhang, Jianwei ^{[1
]}

Wang, Jingchao ^{[1
]}

Zhang, Huanlong ^{[2
]}

Miao, Mengen ^{[1
]}

Zhang, Jie ^{[2
]}

Wu, Di ^{[3
]}

机构：

[1] Zhengzhou Univ Light Ind, Coll Software Engn, Zhengzhou 475000, Peoples R China

[2] Zhengzhou Univ Light Ind, Coll Elect & Informat Engn, Zhengzhou 475000, Peoples R China

[3] Yellow River Engn Consulting Co Ltd, Zhengzhou 450003, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2023年 / 133卷

基金：

中国国家自然科学基金;

关键词：

Visual tracking; Classification and regression network; Spatial and temporal context; Transformer; VISUAL TRACKING;

D O I：

10.1016/j.imavis.2023.104672

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

At present, the transformer is becoming more and more popular in computer vision tasks due to its ability to capture long-range dependencies via self-attention. In this paper, we propose a transformer-based classification regression network TrCAR utilizing the transformer to exploit deeper spatial and temporal context. Different from the classic architecture of the transformer, we introduce convolution operation into the transformer and change the calculation of features to make it suitable for the tracking task. After that, the improved transformer encoder is introduced into the regression branch of TrCAR and combined with the feature pyramid to complete multi-layer feature fusion, which is conducive to obtaining a high-quality target representation. To further enable the target model to adapt to the change of the target appearance, we bring the gradient descent to the regression branch so that it can be updated online to produce a more precise bounding box. Meanwhile, the new transformer is integrated into the classification branch of TrCAR, which as much as possible extracts the essential feature of the target across historical frames via the global computing capability, and uses it to emphasize the target position of the current frame via cross-attention. Which helps the classifier to more easily identify the correct target. Experimental results on OTB, LaSOT, VOT2018, NFS, GOT-10k, and TrackingNet benchmarks show that our TrCAR achieves comparable performance to the popular trackers.

引用

页数：11

共 50 条

[21] AMST2: aggregated multi-level spatial and temporal context-based transformer for robust aerial tracking
Park, Hasil
Lee, Injae
Jeong, Dasol
Paik, Joonki
SCIENTIFIC REPORTS, 2023, 13 (01)
[22] AMST2: aggregated multi-level spatial and temporal context-based transformer for robust aerial tracking
Hasil Park
Injae Lee
Dasol Jeong
Joonki Paik
Scientific Reports, 13
[23] An improved spatial–temporal regularization method for visual object tracking
Muhammad Umar Hayat
Ahmad Ali
Baber Khan
Khizer Mehmood
Khitab Ullah
Muhammad Amir
Signal, Image and Video Processing, 2024, 18 : 2065 - 2077
[24] Spatial-temporal graph Transformer for object tracking against noise interference
Li, Ning
Sang, Haiwei
Zheng, Jiamin
Ma, Huawei
Wang, Xiaoying
Xiao, Fu'an
INFORMATION SCIENCES, 2024, 678
[25] Online spatial-temporal data fusion for robust adaptive tracking
Chen, Jixu
Ji, Qiang
2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 3326 - +
[26] Target-Aware Tracking With Spatial-Temporal Context Attention
He, Kai-Jie
Zhang, Can-Long
Xie, Sheng
Li, Zhi-Xin
Wang, Zhi-Wen
Qin, Rui-Guo
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7176 - 7189
[27] Action unit detection by exploiting spatial-temporal and label-wise attention with transformer
Wang, Lingfeng
Qi, Jin
Cheng, Jian
Suzuki, Kenji
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2469 - 2474
[28] Long-term correlation tracking via spatial–temporal context
Zhi Chen
Peizhong Liu
Yongzhao Du
Yanmin Luo
Jing-Ming Guo
The Visual Computer, 2020, 36 : 425 - 442
[29] An object tracking algorithm based on optical flow and temporal–spatial context
Yongliang Ma
Cluster Computing, 2019, 22 : 5739 - 5747
[30] Tracking Algorithm of Improved Spatio-Temporal Context with Particle Filter
Wen, Wu
Wu, Lizhi
PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 1549 - 1553

← 1 2 3 4 5 →