Positional Label for Self-Supervised Vision Transformer

被引：0

作者：

Zhang, Zhemin ^{[1
]}

Gong, Xun ^{[1
,2
,3
]}

机构：

[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu, Sichuan, Peoples R China

[2] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transp, Beijing, Peoples R China

[3] Mfg Ind Chains Collaborat & Informat Support Tech, Chengdu, Sichuan, Peoples R China

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Positional encoding is important for vision transformer (ViT) to capture the spatial structure of the input image. General effectiveness has been proven in ViT. In our work we propose to train ViT to recognize the positional label of patches of the input image, this apparently simple task actually yields a meaningful self-supervisory task. Based on previous work on ViT positional encoding, we propose two positional labels dedicated to 2D images including absolute position and relative position. Our positional labels can be easily plugged into various current ViT variants. It can work in two ways: (a) As an auxiliary training target for vanilla ViT for better performance. (b) Combine the self-supervised ViT to provide a more powerful self-supervised signal for semantic feature learning. Experiments demonstrate that with the proposed self-supervised methods, ViT-B and Swin-B gain improvements of 1.20% (top-1 Acc) and 0.74% (top-1 Acc) on ImageNet, respectively, and 6.15% and 1.14% improvement on Mini-ImageNet. The code is publicly available at: https://github.com/zhangzhemin/PositionalLabel.

引用

页码：3516 / 3524

页数：9

共 50 条

[41] Unsupervised object discovery with pseudo label generated using K-means and self-supervised transformer
Lim, SeongTaek
Park, JaeEon
Lee, MinYoung
Lee, HongChul
NEUROCOMPUTING, 2023, 545
[42] Transformer-based correlation mining network with self-supervised label generation for multimodal sentiment analysis
Wang, Ruiqing
Yang, Qimeng
Tian, Shengwei
Yu, Long
He, Xiaoyu
Wang, Bo
NEUROCOMPUTING, 2025, 618
[43] Self-supervised Label Augmentation via Input Transformations
Lee, Hankook
Hwang, Sung Ju
Shin, Jinwoo
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[44] Self-supervised knowledge distillation for complementary label learning
Liu, Jiabin
Li, Biao
Lei, Minglong
Shi, Yong
NEURAL NETWORKS, 2022, 155 : 318 - 327
[45] Self-supervised vision transformer-based few-shot learning for facial expression recognition
Chen, Xuanchi
Zheng, Xiangwei
Sun, Kai
Liu, Weilong
Zhang, Yuang
INFORMATION SCIENCES, 2023, 634 : 206 - 226
[46] Identification of Dental Lesions Using Self-Supervised Vision Transformer in Radiographic X-ray Images
Li Y.
Zhao H.
Yang D.
Du S.
Cui X.
Zhang J.
Computer-Aided Design and Applications, 2024, 21 (S23): : 332 - 342
[47] ST-VTON: Self-supervised vision transformer for image-based virtual try-on
Chong, Zheng
Mo, Lingfei
IMAGE AND VISION COMPUTING, 2022, 127
[48] SERE: Exploring Feature Self-Relation for Self-Supervised Transformer
Li, Zhong-Yu
Gao, Shanghua
Cheng, Ming-Ming
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15619 - 15631
[49] Self-Supervised Domain Adaptation for Computer Vision Tasks
Xu, Jiaolong
Xiao, Liang
Lopez, Antonio M.
IEEE ACCESS, 2019, 7 : 156694 - 156706
[50] Self-supervised learning in cooperative stereo vision correspondence
Decoux, B
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 1997, 8 (01) : 101 - 111

← 1 2 3 4 5 →