TrackingMamba: Visual State Space Model for Object Tracking

被引:2
|
作者
Wang, Qingwang [1 ,2 ]
Zhou, Liyao [1 ,2 ]
Jin, Pengcheng [1 ,2 ]
Xin, Qu [1 ,2 ]
Zhong, Hangwei [1 ,2 ]
Song, Haochen [1 ,2 ]
Shen, Tao [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Comp Technol Applicat, Kunming 650500, Peoples R China
基金
中国国家自然科学基金;
关键词
Object tracking; Autonomous aerial vehicles; Transformers; Feature extraction; Computational modeling; Accuracy; Visualization; Jungle scenes; Mamba; object tracking; UAV remote sensing;
D O I
10.1109/JSTARS.2024.3458938
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, UAV object tracking has provided technical support across various fields. Most existing work relies on convolutional neural networks (CNNs) or visual transformers. However, CNNs have limited receptive fields, resulting in suboptimal performance, while transformers require substantial computational resources, making training and inference challenging. Mountainous and jungle environments-critical components of the Earth's surface and key scenarios for UAV object tracking-present unique challenges due to steep terrain, dense vegetation, and rapidly changing weather conditions, which complicate UAV tracking. The lack of relevant datasets further reduces tracking accuracy. This article introduces a new tracking framework based on a state-space model called TrackingMamba, which uses a single-stream tracking architecture with Vision Mamba as its backbone. TrackingMamba not only matches transformer-based trackers in global feature extraction and long-range dependence modeling but also maintains computational efficiency with linear growth. Compared to other advanced trackers, TrackingMamba delivers higher accuracy with a simpler model framework, fewer parameters, and reduced FLOPs. Specifically, on the UAV123 benchmark, TrackingMamba outperforms the baseline model OSTtrack-256, improving AUC by 2.59% and Precision by 4.42%, while reducing parameters by 95.52% and FLOPs by 95.02%. The article also evaluates the performance and shortcomings of TrackingMamba and other advanced trackers in the complex and critical context of jungle environments, and it explores potential future research directions in UAV jungle object tracking.
引用
收藏
页码:16744 / 16754
页数:11
相关论文
共 50 条
  • [21] Visual object tracking method based on local patch model and model update
    Hou, Zhi-Qiang
    Huang, An-Qi
    Yu, Wang-Sheng
    Liu, Xiang
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2015, 37 (06): : 1357 - 1364
  • [22] Explicit Visual Prompts for Visual Object Tracking
    Shi, Liangtao
    Zhong, Bineng
    Liang, Qihua
    Li, Ning
    Zhang, Shengping
    Li, Xianxian
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4838 - 4846
  • [23] Patch Regularization in Visual State Space Model
    Hong, Junyoung
    Yang, Hyeri
    Kim, Ye Ju
    Kim, Shinwoong
    Lee, Kyungjae
    2024 INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS, AND COMMUNICATIONS, ITC-CSCC 2024, 2024,
  • [24] Tracking in object action space
    Kruger, Volker
    Herzog, Dennis
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2013, 117 (07) : 764 - 789
  • [25] A model of space and object-based attention for visual saliency
    Zhong, Jingjing
    Luo, Siwei
    ICICIC 2006: FIRST INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING, INFORMATION AND CONTROL, VOL 2, PROCEEDINGS, 2006, : 237 - +
  • [26] Low-resolution color-based visual tracking with state-space model identification
    Lu, Xin
    Nishiyama, Kiyoshi
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2010, 114 (09) : 1045 - 1054
  • [27] Discriminative appearance model with template spatial adjustment for visual object tracking
    Vadamala, Purandhar Reddy
    Aklak, Annis Fathima
    SOFT COMPUTING, 2023, 27 (14) : 9787 - 9800
  • [28] Discriminative appearance model with template spatial adjustment for visual object tracking
    Purandhar Reddy Vadamala
    Annis Fathima Aklak
    Soft Computing, 2023, 27 : 9787 - 9800
  • [29] Sparse Selective Kernelized Correlation Filter Model for Visual Object Tracking
    Lu, Xiaohuan
    Yuan, Di
    He, Zhenyu
    Li, Donghao
    2017 INTERNATIONAL CONFERENCE ON SECURITY, PATTERN ANALYSIS, AND CYBERNETICS (SPAC), 2017, : 100 - 105
  • [30] A motion model based on recurrent neural networks for visual object tracking
    Shahbazi, Mohammad
    Bayat, Mohammad Hosein
    Tarvirdizadeh, Bahram
    IMAGE AND VISION COMPUTING, 2022, 126