Transformer Tracking for Satellite Video: Matching, Propagation, and Prediction

被引:0
|
作者
Zhao, Manqi [1 ,2 ]
Li, Shengyang [1 ,3 ]
Yang, Jian [1 ,3 ]
机构
[1] Chinese Acad Sci, Technol & Engn Ctr Space Utilizat, Key Lab Space Utilizat, Beijing 100094, Peoples R China
[2] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100049, Peoples R China
[3] Univ Chinese Acad Sci, Sch Aeronaut & Astronaut, Beijing 100049, Peoples R China
关键词
Target tracking; Satellites; Transformers; Training; Object tracking; Predictive models; Pipelines; Adaptation models; Feature extraction; Accuracy; Satellite video object tracking; sequence prediction; static matching; temporal propagation; transformer; OBJECT TRACKING; CORRELATION FILTER;
D O I
10.1109/TGRS.2024.3501380
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Recently, transformer-based trackers have brought overwhelming advantages in general video. However, their performance in satellite video has been hindered by insufficient satellite-specific training and a lack of designs tailored to satellite targets and scene characteristics. To tackle these challenges, we propose a novel transformer-based tracking framework for satellite video object tracking: Transformer Matching, Propagation, and Prediction (TransMPP). TransMPP combines three stages: static matching, dynamic propagation, and prediction, to ensure accurate tracking in satellite videos. Specifically, the Matching model uses a one-stream pipeline for simultaneous feature extraction and relationship modeling across extensive search and template areas, thereby improving foreground and background discrimination capabilities. In addition, the Propagation and Prediction models enhance temporal modeling capabilities through local long-term and short-term feature propagation and global sequence prediction, respectively, boosting tracking robustness. Moreover, to ensure a fair comparison and evaluation, we also developed SatSOT-train, a large-scale training dataset for the SatSOT benchmark. After comprehensive training, TransMPP demonstrates state-of-the-art (SOTA) performance on the SatSOT dataset, achieving an area under the curve (AUC) score of 59.9% and a precision score of 71.5%, bringing improvements of 6.3% and 5.3%, respectively. The code will be available at https://github.com/DonDominic/TransMPP.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Target-Aware Transformer for Satellite Video Object Tracking
    Lai, Pujian
    Zhang, Meili
    Cheng, Gong
    Li, Shengyang
    Huang, Xiankai
    Han, Junwei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 10
  • [2] Sequential matching algorithm of position prediction for satellite tracking
    Cen, Ming
    Fu, Cheng-Yu
    Zhong, Dai-Jun
    Liu, Xing-Fa
    Guangdian Gongcheng/Opto-Electronic Engineering, 2006, 33 (01): : 24 - 27
  • [3] High-Order Relation Learning Transformer for Satellite Video Object Tracking
    Yang, Xiaoyan
    Jiao, Licheng
    Li, Yangyang
    Liu, Xu
    Li, Lingling
    Chen, Puhua
    Liu, Fang
    Yang, Shuyuan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [4] Prediction Matching for Video Coding
    Zheng, Yunfei
    Yin, Peng
    Divorra Escoda, Oscar
    Sole, Joel
    Gomila, Cristina
    VISUAL INFORMATION PROCESSING AND COMMUNICATION, 2010, 7543
  • [5] Video tracking using block matching
    Hariharakrishnan, K
    Schonfeld, D
    Raffy, P
    Yassa, F
    2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, : 945 - 948
  • [6] ProPainter: Improving Propagation and Transformer for Video Inpainting
    Zhou, Shangchen
    Li, Chongyi
    Chan, Kelvin C. K.
    Loy, Chen Change
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10443 - 10452
  • [7] Video Instance Segmentation Using Graph Matching Transformer
    Qin, Zheyun
    Lu, Xiankai
    Nie, Xiushan
    Yin, Yilong
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 995 - 1004
  • [8] Scene Video Text Tracking With Graph Matching
    Pei, Wei-Yi
    Yang, Chun
    Meng, Li-Yu
    Hou, Jie-Bo
    Tian, Shu
    Yin, Xu-Cheng
    IEEE ACCESS, 2018, 6 : 19419 - 19426
  • [9] Moving Object Tracking for Aerial Video Coding using Linear Motion Prediction and Block Matching
    Meuel, Holger
    Angerstein, Luis
    Henschel, Roberto
    Rosenhahn, Bodo
    Ostermann, Jorn
    2016 PICTURE CODING SYMPOSIUM (PCS), 2016,
  • [10] Scalable video transformer for full-frame video prediction
    Li, Zhan
    Liu, Feng
    Computer Vision and Image Understanding, 2024, 249