Transformer Tracking for Satellite Video: Matching, Propagation, and Prediction

被引:0
|
作者
Zhao, Manqi [1 ,2 ]
Li, Shengyang [1 ,3 ]
Yang, Jian [1 ,3 ]
机构
[1] Chinese Acad Sci, Technol & Engn Ctr Space Utilizat, Key Lab Space Utilizat, Beijing 100094, Peoples R China
[2] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100049, Peoples R China
[3] Univ Chinese Acad Sci, Sch Aeronaut & Astronaut, Beijing 100049, Peoples R China
关键词
Target tracking; Satellites; Transformers; Training; Object tracking; Predictive models; Pipelines; Adaptation models; Feature extraction; Accuracy; Satellite video object tracking; sequence prediction; static matching; temporal propagation; transformer; OBJECT TRACKING; CORRELATION FILTER;
D O I
10.1109/TGRS.2024.3501380
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Recently, transformer-based trackers have brought overwhelming advantages in general video. However, their performance in satellite video has been hindered by insufficient satellite-specific training and a lack of designs tailored to satellite targets and scene characteristics. To tackle these challenges, we propose a novel transformer-based tracking framework for satellite video object tracking: Transformer Matching, Propagation, and Prediction (TransMPP). TransMPP combines three stages: static matching, dynamic propagation, and prediction, to ensure accurate tracking in satellite videos. Specifically, the Matching model uses a one-stream pipeline for simultaneous feature extraction and relationship modeling across extensive search and template areas, thereby improving foreground and background discrimination capabilities. In addition, the Propagation and Prediction models enhance temporal modeling capabilities through local long-term and short-term feature propagation and global sequence prediction, respectively, boosting tracking robustness. Moreover, to ensure a fair comparison and evaluation, we also developed SatSOT-train, a large-scale training dataset for the SatSOT benchmark. After comprehensive training, TransMPP demonstrates state-of-the-art (SOTA) performance on the SatSOT dataset, achieving an area under the curve (AUC) score of 59.9% and a precision score of 71.5%, bringing improvements of 6.3% and 5.3%, respectively. The code will be available at https://github.com/DonDominic/TransMPP.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Swin transformer-based traffic video text tracking
    Yu, Jinyao
    Qian, Jiangbo
    Xin, Yu
    Wang, Chong
    Dong, Yihong
    APPLIED INTELLIGENCE, 2024, 54 (21) : 10581 - 10595
  • [22] Local Frequency Domain Transformer Networks for Video Prediction
    Farazi, Hafez
    Nogga, Jan
    Behnke, Sven
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [23] Separation Fusion Transformer and Efficient Reuse Matching Network for Aerial Tracking
    Deng, Anping
    Chen, Dianbing
    Han, Guangliang
    Yang, Hang
    Liu, Zhichao
    Liu, Faxue
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [24] Feature tracking and matching in video using programmable graphics hardware
    Sudipta N. Sinha
    Jan-Michael Frahm
    Marc Pollefeys
    Yakup Genc
    Machine Vision and Applications, 2011, 22 : 207 - 217
  • [25] Adaptive motion tracking block matching algorithms for video coding
    Xu, JB
    Po, LM
    Cheung, CK
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1999, 9 (07) : 1025 - 1029
  • [26] Template matching approach for automatic human body tracking in video
    Abdellaoui, Mehrez
    Douik, Ali
    INTERNATIONAL JOURNAL OF INTELLIGENT ENGINEERING INFORMATICS, 2018, 6 (05) : 434 - 447
  • [27] Human target matching and tracking method in coal mine video
    Sun, Jiping
    Jia, Ni
    Zhongguo Kuangye Daxue Xuebao/Journal of China University of Mining and Technology, 2015, 44 (03): : 540 - 548
  • [28] Video tracking using improved chamfer matching and particle filter
    Wu, Tao
    Ding, Xiaoqing
    Wang, Shengjin
    ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL III, PROCEEDINGS, 2007, : 169 - 173
  • [29] Feature tracking and matching in video using programmable graphics hardware
    Sinha, Sudipta N.
    Frahm, Jan-Michael
    Pollefeys, Marc
    Genc, Yakup
    MACHINE VISION AND APPLICATIONS, 2011, 22 (01) : 207 - 217
  • [30] Propagation fade prediction for satellite personal communication services
    Lin, HP
    IEE PROCEEDINGS-MICROWAVES ANTENNAS AND PROPAGATION, 1999, 146 (06) : 374 - 378