SERE: Exploring Feature Self-Relation for Self-Supervised Transformer

被引：3

作者：

Li, Zhong-Yu ^{[1
]}

Gao, Shanghua ^{[1
]}

Cheng, Ming-Ming ^{[1
]}

机构：

[1] Nankai Univ, TMCC, CS, Tianjin 300350, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 12期

关键词：

Training; Self-supervised learning; Task analysis; Feature extraction; Convolutional neural networks; Transformers; Semantics; vision transformer; feature self-relation;

D O I：

10.1109/TPAMI.2023.3309979

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning representations with self-supervision for convolutional networks (CNN) has been validated to be effective for vision tasks. As an alternative to CNN, vision transformers (ViT) have strong representation ability with spatial self-attention and channel-level feedforward networks. Recent works reveal that self-supervised learning helps unleash the great potential of ViT. Still, most works follow self-supervised strategies designed for CNN, e.g., instance-level discrimination of samples, but they ignore the properties of ViT. We observe that relational modeling on spatial and channel dimensions distinguishes ViT from other networks. To enforce this property, we explore the feature SElf-RElation (SERE) for training self-supervised ViT. Specifically, instead of conducting self-supervised learning solely on feature embeddings from multiple views, we utilize the feature self-relations, i.e., spatial/channel self-relations, for self-supervised learning. Self-relation based learning further enhances the relation modeling ability of ViT, resulting in stronger representations that stably improve performance on multiple downstream tasks.

引用

页码：15619 / 15631

页数：13

共 50 条

[21] Self-supervised relation extraction from the web
Feldman, Ronen
Rosenfled, Benjamin
Soderland, Stephen
Etzioni, Oren
FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2006, 4203 : 755 - 764
[22] TFDEPTH: SELF-SUPERVISED MONOCULARDEPTH ESTIMATION WITH MULITI-SCALE SELECTIVE TRANSFORMER FEATURE FUSION
Hu, Hongli
Miao, Jun
Zhu, Guanghu
Yan, Je
Chu, Jun
IMAGE ANALYSIS & STEREOLOGY, 2024, 43 (02): : 139 - 149
[23] A Self-Supervised Transformer With Feature Fusion for SAR Image Semantic Segmentation in Marine Aquaculture Monitoring
Fan, Jianchao
Zhou, Jianlin
Wang, Xinzhe
Wang, Jun
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[24] Concurrent discrimination and alignment for self-supervised feature learning
Dutta, Anjan
Mancini, Massimiliano
Akata, Zeynep
arXiv, 2021,
[25] An Improved Self-Supervised Framework for Feature Point Detection
Wu, Yunhui
Li, Jun
ELEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2019), 2019, 11179
[26] Self-supervised feature matched virtual try-on
Jiang, Shiyi
Xu, Yang
Li, Danyang
Fan, Runze
JOURNAL OF COMPUTATIONAL DESIGN AND ENGINEERING, 2023, 10 (05) : 1958 - 1969
[27] Self-Supervised Feature Specific Neural Matrix Completion
Aktukmak, Mehmet
Mercier, Samuel M.
Uysal, Ismail
IEEE ACCESS, 2020, 8 : 198168 - 198177
[28] Grid Feature Jigsaw for Self-supervised Image Clustering
Song, Zijie
Hu, Zhenzhen
Hong, Richang
Proceedings of the International Joint Conference on Neural Networks, 2023, 2023-June
[29] Grid Feature Jigsaw for Self-supervised Image Clustering
Song, Zijie
Hu, Zhenzhen
Hong, Richang
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[30] Concurrent Discrimination and Alignment for Self-Supervised Feature Learning
Dutta, Anjan
Mancini, Massimiliano
Akata, Zeynep
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2189 - 2198

← 1 2 3 4 5 →