Positional Label for Self-Supervised Vision Transformer

被引:0
|
作者
Zhang, Zhemin [1 ]
Gong, Xun [1 ,2 ,3 ]
机构
[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu, Sichuan, Peoples R China
[2] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transp, Beijing, Peoples R China
[3] Mfg Ind Chains Collaborat & Informat Support Tech, Chengdu, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Positional encoding is important for vision transformer (ViT) to capture the spatial structure of the input image. General effectiveness has been proven in ViT. In our work we propose to train ViT to recognize the positional label of patches of the input image, this apparently simple task actually yields a meaningful self-supervisory task. Based on previous work on ViT positional encoding, we propose two positional labels dedicated to 2D images including absolute position and relative position. Our positional labels can be easily plugged into various current ViT variants. It can work in two ways: (a) As an auxiliary training target for vanilla ViT for better performance. (b) Combine the self-supervised ViT to provide a more powerful self-supervised signal for semantic feature learning. Experiments demonstrate that with the proposed self-supervised methods, ViT-B and Swin-B gain improvements of 1.20% (top-1 Acc) and 0.74% (top-1 Acc) on ImageNet, respectively, and 6.15% and 1.14% improvement on Mini-ImageNet. The code is publicly available at: https://github.com/zhangzhemin/PositionalLabel.
引用
收藏
页码:3516 / 3524
页数:9
相关论文
共 50 条
  • [21] Self-Supervised Graph Transformer for Deepfake Detection
    Khormali, Aminollah
    Yuan, Jiann-Shiun
    IEEE ACCESS, 2024, 12 : 58114 - 58127
  • [22] Geometrized Transformer for Self-Supervised Homography Estimation
    Liu, Jiazhen
    Li, Xirong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9522 - 9531
  • [23] Integrated self-supervised label propagation for label imbalanced sets
    Ge, Zeping
    Yang, Youlong
    Du, Zhenye
    APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8525 - 8544
  • [24] Self-supervised vision transformers for semantic segmentation
    Gu, Xianfan
    Hu, Yingdong
    Wen, Chuan
    Gao, Yang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
  • [25] Emerging Properties in Self-Supervised Vision Transformers
    Caron, Mathilde
    Touvron, Hugo
    Misra, Ishan
    Jegou, Herve
    Mairal, Julien
    Bojanowski, Piotr
    Joulin, Armand
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9630 - 9640
  • [26] Self-supervised Vision Transformers for Writer Retrieval
    Raven, Tim
    Matei, Arthur
    Fink, Gernot A.
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 380 - 396
  • [27] Clinical Outcome Prediction in COVID-19 using Self-supervised Vision Transformer Representations
    Konwer, Aishik
    Prasanna, Prateek
    MEDICAL IMAGING 2022: COMPUTER-AIDED DIAGNOSIS, 2022, 12033
  • [28] Self-Supervised Vision Transformers for Malware Detection
    Seneviratne, Sachith
    Shariffdeen, Ridwan
    Rasnayaka, Sanka
    Kasthuriarachchi, Nuran
    IEEE ACCESS, 2022, 10 : 103121 - 103135
  • [29] Personvit: large-scale self-supervised vision transformer for person re-identification
    Hu, Bin
    Wang, Xinggang
    Liu, Wenyu
    MACHINE VISION AND APPLICATIONS, 2025, 36 (02)
  • [30] Few-shot segmentation for esophageal OCT images based on self-supervised vision transformer
    Wang, Cong
    Gan, Meng
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (02)