SslTransT: Self-supervised pre-training visual object tracking with Transformers

被引:0
|
作者
Cai, Yannan [1 ]
Tan, Ke [1 ]
Wei, Zhenzhong [1 ]
机构
[1] Beihang Univ, Sch Instrumentat Sci & Optoelect Engn, Key Lab Precis Optomechatron Technol, Minist Educ, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-supervised; Hybrid CNN-transformer; Visual object tracking; 6D pose measurement system; BENCHMARK;
D O I
10.1016/j.optcom.2024.130329
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
Transformer-based visual object tracking surpasses conventional CNN-based counterparts in superior performance but comes with additional computational overhead. Existing Transformer-based trackers rely on large-scale annotated data and longer training periods. To address this issue, we introduce a self-supervised pretext task, named target localization, which randomly crops the target and then pastes it onto various background images. The copy-paste-transform data augmentation strategy can composite sufficient training data and facilitate routine training. In addition, freezing the CNN backbone during pre -training and randomly adjusting template and search area factors further lead to faster training convergence. Extensive experiments both on public tracking benchmarks and real aircraft flight test videos demonstrate that our proposed tracker SslTransT significantly outperforms the baseline performance while requiring only half the training time. Furthermore, we apply SslTransT to a 6D pose measurement system based on vision and laser ranging, achieving excellent tracking results while running in real -time.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Class incremental learning with self-supervised pre-training and prototype learning
    Liu, Wenzhuo
    Wu, Xin-Jian
    Zhu, Fei
    Yu, Ming-Ming
    Wang, Chuang
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2025, 157
  • [42] Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering
    Yang, Yaming
    Guan, Ziyu
    Wang, Zhe
    Zhao, Wei
    Xu, Cai
    Lu, Weigang
    Huang, Jianbin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [43] Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds
    Hess, Georg
    Jaxing, Johan
    Svensson, Elias
    Hagerman, David
    Petersson, Christoffer
    Svensson, Lennart
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2023, : 350 - 359
  • [44] Feature-Suppressed Contrast for Self-Supervised Food Pre-training
    Liu, Xinda
    Zhu, Yaohui
    Liu, Linhu
    Tian, Jiang
    Wang, Lili
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4359 - 4367
  • [45] Self-supervised Pre-training with Acoustic Configurations for Replay Spoofing Detection
    Shim, Hye-jin
    Heo, Hee-Soo
    Jung, Jee-weon
    Yu, Ha-Jin
    INTERSPEECH 2020, 2020, : 1091 - 1095
  • [46] PreTraM: Self-supervised Pre-training via Connecting Trajectory and Map
    Xu, Chenfeng
    Li, Tian
    Tang, Chen
    Sun, Lingfeng
    Keutzer, Kurt
    Tomizuka, Masayoshi
    Fathi, Alireza
    Zhan, Wei
    COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 34 - 50
  • [47] MULTI-TASK SELF-SUPERVISED PRE-TRAINING FOR MUSIC CLASSIFICATION
    Wu, Ho-Hsiang
    Kao, Chieh-Chi
    Tang, Qingming
    Sun, Ming
    McFee, Brian
    Bello, Juan Pablo
    Wang, Chao
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 556 - 560
  • [48] Self-supervised Pre-training and Semi-supervised Learning for Extractive Dialog Summarization
    Zhuang, Yingying
    Song, Jiecheng
    Sadagopan, Narayanan
    Beniwal, Anurag
    COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 1069 - 1076
  • [49] Self-Supervised Underwater Image Generation for Underwater Domain Pre-Training
    Wu, Zhiheng
    Wu, Zhengxing
    Chen, Xingyu
    Lu, Yue
    Yu, Junzhi
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 14
  • [50] COMPARISON OF SELF-SUPERVISED SPEECH PRE-TRAINING METHODS ON FLEMISH DUTCH
    Poncelet, Jakob
    Hamme, Hugo Van
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 169 - 176