SslTransT: Self-supervised pre-training visual object tracking with Transformers

被引:0
|
作者
Cai, Yannan [1 ]
Tan, Ke [1 ]
Wei, Zhenzhong [1 ]
机构
[1] Beihang Univ, Sch Instrumentat Sci & Optoelect Engn, Key Lab Precis Optomechatron Technol, Minist Educ, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-supervised; Hybrid CNN-transformer; Visual object tracking; 6D pose measurement system; BENCHMARK;
D O I
10.1016/j.optcom.2024.130329
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
Transformer-based visual object tracking surpasses conventional CNN-based counterparts in superior performance but comes with additional computational overhead. Existing Transformer-based trackers rely on large-scale annotated data and longer training periods. To address this issue, we introduce a self-supervised pretext task, named target localization, which randomly crops the target and then pastes it onto various background images. The copy-paste-transform data augmentation strategy can composite sufficient training data and facilitate routine training. In addition, freezing the CNN backbone during pre -training and randomly adjusting template and search area factors further lead to faster training convergence. Extensive experiments both on public tracking benchmarks and real aircraft flight test videos demonstrate that our proposed tracker SslTransT significantly outperforms the baseline performance while requiring only half the training time. Furthermore, we apply SslTransT to a 6D pose measurement system based on vision and laser ranging, achieving excellent tracking results while running in real -time.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre-Training
    Dave, Vedant
    Lygerakis, Fotios
    Rueckert, Elmar
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 8013 - 8020
  • [32] Self-Supervised Pre-Training Joint Framework: Assisting Lightweight Detection Network for Underwater Object Detection
    Wang, Zhuo
    Chen, Haojie
    Qin, Hongde
    Chen, Qin
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (03)
  • [33] Joint Encoder-Decoder Self-Supervised Pre-training for ASR
    Arunkumar, A.
    Umesh, S.
    INTERSPEECH 2022, 2022, : 3418 - 3422
  • [34] ENHANCING THE DOMAIN ROBUSTNESS OF SELF-SUPERVISED PRE-TRAINING WITH SYNTHETIC IMAGES
    Hassan, Mohamad N. C.
    Bhattacharya, Avigyan
    da Costa, Victor G. Turrisi
    Banerjee, Biplab
    Ricci, Elisa
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5470 - 5474
  • [35] Individualized Stress Mobile Sensing Using Self-Supervised Pre-Training
    Islam, Tanvir
    Washington, Peter
    APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [36] Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training
    Huang, Sung-Feng
    Chuang, Shun-Po
    Liu, Da-Rong
    Chen, Yi-Chen
    Yang, Gene-Ping
    Lee, Hung-yi
    INTERSPEECH 2021, 2021, : 3056 - 3060
  • [37] Self-Supervised Pre-training for Protein Embeddings Using Tertiary Structures
    Guo, Yuzhi
    Wu, Jiaxiang
    Ma, Hehuan
    Huang, Junzhou
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6801 - 6809
  • [38] DialogueBERT: A Self-Supervised Learning based Dialogue Pre-training Encoder
    Zhang, Zhenyu
    Guo, Tao
    Chen, Meng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3647 - 3651
  • [39] Progressive self-supervised learning: A pre-training method for crowd counting
    Gu, Yao
    Zheng, Zhe
    Wu, Yingna
    Xie, Guangping
    Ni, Na
    PATTERN RECOGNITION LETTERS, 2025, 188 : 148 - 154
  • [40] GUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION
    Khare, Aparna
    Wu, Minhua
    Bhati, Saurabhchand
    Droppo, Jasha
    Maas, Roland
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 174 - 181