Transformer visual object tracking algorithm based on mixed attention

被引:0
|
作者
Hou Z.-Q. [1 ]
Guo F. [1 ]
Yang X.-L. [1 ]
Ma S.-G. [1 ]
Fan J.-L. [2 ]
机构
[1] School of Computer, Xi'an University of Posts & Telecommunications, Xi'an
[2] School of Communication and Information Engineering, Xi'an University of Posts & Telecommunications, Xi'an
来源
Kongzhi yu Juece/Control and Decision | 2024年 / 39卷 / 03期
关键词
attention mechanism; computer vision; deep learning; object tracking; siamese network; Transformer;
D O I
10.13195/j.kzyjc.2022.1340
中图分类号
学科分类号
摘要
The Transformer-based visual object tracking algorithm can capture the global information of the target well, but there is a possibility of further improvement in the presentation of the object features. To better improve the expression ability of object features, a Transformer visual object tracking algorithm based on mixed attention is proposed. First, the mixed attention module is introduced to capture the features of the object in the spatial and channel dimensions, so as to model the contextual dependencies of the target features. Second, the feature maps are sampled by multiple parallel dilated convolutions with different dilation rates to obtain the multi-scale features of the images, and enhance the local feature representation. Finally, the convolutional position encoding constructed is added to the Transformer encoder to provide accurate and length-adaptive position coding for the tracker, thereby improving the accuracy of tracking and positioning. The experimental results of the proposed algorithm on OTB 100, VOT 2018 and LaSOT show that by learning the relationship between features through the Transformer network based on mixed attention, the object features can be better represented. Compared with other mainstream object tracking algorithms, the proposed algorithm has better tracking performance and achieves a real-time tracking speed of 26 frames per second. © 2024 Northeast University. All rights reserved.
引用
收藏
页码:739 / 748
页数:9
相关论文
共 16 条
  • [1] Li X, Zha Y F, Zhang T Z, Et al., Survey of visual object tracking algorithms based on deep learning, Journal of Image and Graphics, 24, 12, pp. 2057-2080, (2019)
  • [2] Lu H C, Li P X, Wang D., Visual object tracking: A survey, Pattern Recognition and Artificial Intelligence, 31, 1, pp. 61-76, (2018)
  • [3] Bertinetto L, Valmadre J, Henriques J F, Et al., Fully-convolutional Siamese networks for object tracking, European Conference on Computer Vision, pp. 850-865, (2016)
  • [4] Li B, Yan J J, Wu W, Et al., High performance visual tracking with Siamese region proposal network, IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8971-8980, (2018)
  • [5] Danelljan M, Bhat G, Khan F S, Et al., Atom: Accurate tracking by overlap maximization[C], Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4660-4669, (2019)
  • [6] Bhat G, Danelljan M, van Gool L, Et al., Learning discriminative model prediction for tracking, Proceedings of the IEEE International Conference on Computer Vision, pp. 6182-6191, (2019)
  • [7] Danelljan M, van Gool L, Timofte R., Probabilistic regression for visual tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7183-7192, (2020)
  • [8] Liu R H, Zhang J X, Jin C X, Et al., Target tracking based on deformable convolution Siamese network, Control and Decision, 37, 8, pp. 2049-2055, (2022)
  • [9] Chen Z W, Wang Y, Song J, Et al., Improvement of IoU network tracking with adaptive weighted characteristic responses, Control and Decision, 37, 7, pp. 1752-1762, (2022)
  • [10] Yan B, Peng H W, Fu J L, Et al., Learning spatio-temporal transformer for visual tracking, Proceedings of the IEEE International Conference on Computer Vision, pp. 10448-10457, (2021)