SERE: Exploring Feature Self-Relation for Self-Supervised Transformer

被引:3
|
作者
Li, Zhong-Yu [1 ]
Gao, Shanghua [1 ]
Cheng, Ming-Ming [1 ]
机构
[1] Nankai Univ, TMCC, CS, Tianjin 300350, Peoples R China
关键词
Training; Self-supervised learning; Task analysis; Feature extraction; Convolutional neural networks; Transformers; Semantics; vision transformer; feature self-relation;
D O I
10.1109/TPAMI.2023.3309979
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning representations with self-supervision for convolutional networks (CNN) has been validated to be effective for vision tasks. As an alternative to CNN, vision transformers (ViT) have strong representation ability with spatial self-attention and channel-level feedforward networks. Recent works reveal that self-supervised learning helps unleash the great potential of ViT. Still, most works follow self-supervised strategies designed for CNN, e.g., instance-level discrimination of samples, but they ignore the properties of ViT. We observe that relational modeling on spatial and channel dimensions distinguishes ViT from other networks. To enforce this property, we explore the feature SElf-RElation (SERE) for training self-supervised ViT. Specifically, instead of conducting self-supervised learning solely on feature embeddings from multiple views, we utilize the feature self-relations, i.e., spatial/channel self-relations, for self-supervised learning. Self-relation based learning further enhances the relation modeling ability of ViT, resulting in stronger representations that stably improve performance on multiple downstream tasks.
引用
收藏
页码:15619 / 15631
页数:13
相关论文
共 50 条
  • [1] Self-supervised Video Transformer
    Ranasinghe, Kanchana
    Naseer, Muzammal
    Khan, Salman
    Khan, Fahad Shahbaz
    Ryoo, Michael S.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2864 - 2874
  • [2] On Feature Decorrelation in Self-Supervised Learning
    Hua, Tianyu
    Wang, Wenxiao
    Xue, Zihui
    Ren, Sucheng
    Wang, Yue
    Zhao, Hang
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9578 - 9588
  • [3] SelfORE: Self-supervised Relational Feature Learning for Open Relation Extraction
    Hu, Xuming
    Lijie Wen
    Xu, Yusong
    Zhang, Chenwei
    Yu, Philip S.
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3673 - 3682
  • [4] Positional Label for Self-Supervised Vision Transformer
    Zhang, Zhemin
    Gong, Xun
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3516 - 3524
  • [5] SSAST: Self-Supervised Audio Spectrogram Transformer
    Gong, Yuan
    Lai, Cheng-I Jeff
    Chung, Yu-An
    Glass, James
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10699 - 10709
  • [6] Self-Supervised Hypergraph Transformer for Recommender Systems
    Xia, Lianghao
    Huang, Chao
    Zhang, Chuxu
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 2100 - 2109
  • [7] Self-Supervised Graph Transformer for Deepfake Detection
    Khormali, Aminollah
    Yuan, Jiann-Shiun
    IEEE ACCESS, 2024, 12 : 58114 - 58127
  • [8] Geometrized Transformer for Self-Supervised Homography Estimation
    Liu, Jiazhen
    Li, Xirong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9522 - 9531
  • [9] SFT: Few-Shot Learning via Self-Supervised Feature Fusion With Transformer
    Lim, Jit Yan
    Lim, Kian Ming
    Lee, Chin Poo
    Tan, Yong Xuan
    IEEE ACCESS, 2024, 12 : 86690 - 86703
  • [10] Digging Into Self-Supervised Learning of Feature Descriptors
    Melekhov, Iaroslav
    Laskar, Zakaria
    Li, Xiaotian
    Wang, Shuzhe
    Kannala, Juho
    2021 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2021), 2021, : 1144 - 1155