SERE: Exploring Feature Self-Relation for Self-Supervised Transformer

被引:3
|
作者
Li, Zhong-Yu [1 ]
Gao, Shanghua [1 ]
Cheng, Ming-Ming [1 ]
机构
[1] Nankai Univ, TMCC, CS, Tianjin 300350, Peoples R China
关键词
Training; Self-supervised learning; Task analysis; Feature extraction; Convolutional neural networks; Transformers; Semantics; vision transformer; feature self-relation;
D O I
10.1109/TPAMI.2023.3309979
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning representations with self-supervision for convolutional networks (CNN) has been validated to be effective for vision tasks. As an alternative to CNN, vision transformers (ViT) have strong representation ability with spatial self-attention and channel-level feedforward networks. Recent works reveal that self-supervised learning helps unleash the great potential of ViT. Still, most works follow self-supervised strategies designed for CNN, e.g., instance-level discrimination of samples, but they ignore the properties of ViT. We observe that relational modeling on spatial and channel dimensions distinguishes ViT from other networks. To enforce this property, we explore the feature SElf-RElation (SERE) for training self-supervised ViT. Specifically, instead of conducting self-supervised learning solely on feature embeddings from multiple views, we utilize the feature self-relations, i.e., spatial/channel self-relations, for self-supervised learning. Self-relation based learning further enhances the relation modeling ability of ViT, resulting in stronger representations that stably improve performance on multiple downstream tasks.
引用
收藏
页码:15619 / 15631
页数:13
相关论文
共 50 条
  • [21] Self-supervised relation extraction from the web
    Feldman, Ronen
    Rosenfled, Benjamin
    Soderland, Stephen
    Etzioni, Oren
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2006, 4203 : 755 - 764
  • [22] TFDEPTH: SELF-SUPERVISED MONOCULARDEPTH ESTIMATION WITH MULITI-SCALE SELECTIVE TRANSFORMER FEATURE FUSION
    Hu, Hongli
    Miao, Jun
    Zhu, Guanghu
    Yan, Je
    Chu, Jun
    IMAGE ANALYSIS & STEREOLOGY, 2024, 43 (02): : 139 - 149
  • [23] A Self-Supervised Transformer With Feature Fusion for SAR Image Semantic Segmentation in Marine Aquaculture Monitoring
    Fan, Jianchao
    Zhou, Jianlin
    Wang, Xinzhe
    Wang, Jun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [24] Concurrent discrimination and alignment for self-supervised feature learning
    Dutta, Anjan
    Mancini, Massimiliano
    Akata, Zeynep
    arXiv, 2021,
  • [25] An Improved Self-Supervised Framework for Feature Point Detection
    Wu, Yunhui
    Li, Jun
    ELEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2019), 2019, 11179
  • [26] Self-supervised feature matched virtual try-on
    Jiang, Shiyi
    Xu, Yang
    Li, Danyang
    Fan, Runze
    JOURNAL OF COMPUTATIONAL DESIGN AND ENGINEERING, 2023, 10 (05) : 1958 - 1969
  • [27] Self-Supervised Feature Specific Neural Matrix Completion
    Aktukmak, Mehmet
    Mercier, Samuel M.
    Uysal, Ismail
    IEEE ACCESS, 2020, 8 : 198168 - 198177
  • [28] Grid Feature Jigsaw for Self-supervised Image Clustering
    Song, Zijie
    Hu, Zhenzhen
    Hong, Richang
    Proceedings of the International Joint Conference on Neural Networks, 2023, 2023-June
  • [29] Grid Feature Jigsaw for Self-supervised Image Clustering
    Song, Zijie
    Hu, Zhenzhen
    Hong, Richang
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [30] Concurrent Discrimination and Alignment for Self-Supervised Feature Learning
    Dutta, Anjan
    Mancini, Massimiliano
    Akata, Zeynep
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2189 - 2198