Transformer Tracking for Satellite Video: Matching, Propagation, and Prediction

被引：0

作者：

Zhao, Manqi ^{[1
,2
]}

Li, Shengyang ^{[1
,3
]}

Yang, Jian ^{[1
,3
]}

机构：

[1] Chinese Acad Sci, Technol & Engn Ctr Space Utilizat, Key Lab Space Utilizat, Beijing 100094, Peoples R China

[2] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100049, Peoples R China

[3] Univ Chinese Acad Sci, Sch Aeronaut & Astronaut, Beijing 100049, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

关键词：

Target tracking; Satellites; Transformers; Training; Object tracking; Predictive models; Pipelines; Adaptation models; Feature extraction; Accuracy; Satellite video object tracking; sequence prediction; static matching; temporal propagation; transformer; OBJECT TRACKING; CORRELATION FILTER;

D O I：

10.1109/TGRS.2024.3501380

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Recently, transformer-based trackers have brought overwhelming advantages in general video. However, their performance in satellite video has been hindered by insufficient satellite-specific training and a lack of designs tailored to satellite targets and scene characteristics. To tackle these challenges, we propose a novel transformer-based tracking framework for satellite video object tracking: Transformer Matching, Propagation, and Prediction (TransMPP). TransMPP combines three stages: static matching, dynamic propagation, and prediction, to ensure accurate tracking in satellite videos. Specifically, the Matching model uses a one-stream pipeline for simultaneous feature extraction and relationship modeling across extensive search and template areas, thereby improving foreground and background discrimination capabilities. In addition, the Propagation and Prediction models enhance temporal modeling capabilities through local long-term and short-term feature propagation and global sequence prediction, respectively, boosting tracking robustness. Moreover, to ensure a fair comparison and evaluation, we also developed SatSOT-train, a large-scale training dataset for the SatSOT benchmark. After comprehensive training, TransMPP demonstrates state-of-the-art (SOTA) performance on the SatSOT dataset, achieving an area under the curve (AUC) score of 59.9% and a precision score of 71.5%, bringing improvements of 6.3% and 5.3%, respectively. The code will be available at https://github.com/DonDominic/TransMPP.

引用

页数：16

共 50 条

[1] Target-Aware Transformer for Satellite Video Object Tracking
Lai, Pujian
Zhang, Meili
Cheng, Gong
Li, Shengyang
Huang, Xiankai
Han, Junwei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 10
[2] Sequential matching algorithm of position prediction for satellite tracking
Cen, Ming
Fu, Cheng-Yu
Zhong, Dai-Jun
Liu, Xing-Fa
Guangdian Gongcheng/Opto-Electronic Engineering, 2006, 33 (01): : 24 - 27
[3] High-Order Relation Learning Transformer for Satellite Video Object Tracking
Yang, Xiaoyan
Jiao, Licheng
Li, Yangyang
Liu, Xu
Li, Lingling
Chen, Puhua
Liu, Fang
Yang, Shuyuan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[4] Prediction Matching for Video Coding
Zheng, Yunfei
Yin, Peng
Divorra Escoda, Oscar
Sole, Joel
Gomila, Cristina
VISUAL INFORMATION PROCESSING AND COMMUNICATION, 2010, 7543
[5] Video tracking using block matching
Hariharakrishnan, K
Schonfeld, D
Raffy, P
Yassa, F
2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, : 945 - 948
[6] ProPainter: Improving Propagation and Transformer for Video Inpainting
Zhou, Shangchen
Li, Chongyi
Chan, Kelvin C. K.
Loy, Chen Change
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10443 - 10452
[7] Video Instance Segmentation Using Graph Matching Transformer
Qin, Zheyun
Lu, Xiankai
Nie, Xiushan
Yin, Yilong
2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 995 - 1004
[8] Scene Video Text Tracking With Graph Matching
Pei, Wei-Yi
Yang, Chun
Meng, Li-Yu
Hou, Jie-Bo
Tian, Shu
Yin, Xu-Cheng
IEEE ACCESS, 2018, 6 : 19419 - 19426
[9] Moving Object Tracking for Aerial Video Coding using Linear Motion Prediction and Block Matching
Meuel, Holger
Angerstein, Luis
Henschel, Roberto
Rosenhahn, Bodo
Ostermann, Jorn
2016 PICTURE CODING SYMPOSIUM (PCS), 2016,
[10] Scalable video transformer for full-frame video prediction
Li, Zhan
Liu, Feng
Computer Vision and Image Understanding, 2024, 249

← 1 2 3 4 5 →