Big2Small: Learning from masked image modelling with heterogeneous self-supervised knowledge distillation

被引:0
|
作者
Wang, Ziming [1 ]
Han, Shumin [2 ]
Wang, Xiaodi [2 ]
Hao, Jing [2 ]
Cao, Xianbin [1 ]
Zhang, Baochang [3 ,4 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, Beijing, Peoples R China
[2] Baidu Com, Beijing, Peoples R China
[3] Beihang Univ, Sch Artificial Intelligence, Beijing, Peoples R China
[4] Beihang Univ, Beijing, Peoples R China
关键词
artificial intelligence; deep neural network; machine intelligence; machine learning; vision;
D O I
10.1049/csy2.70002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Small convolutional neural network (CNN)-based models usually require transferring knowledge from a large model before they are deployed in computationally resource-limited edge devices. Masked image modelling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. The reason is mainly due to the significant discrepancy between the transformer-based large model and the CNN-based small network. In this paper, the authors develop the first heterogeneous self-supervised knowledge distillation (HSKD) based on MIM, which can efficiently transfer knowledge from large transformer models to small CNN-based models in a self-supervised fashion. Our method builds a bridge between transformer-based models and CNNs by training a UNet-style student with sparse convolution, which can effectively mimic the visual representation inferred by a teacher over masked modelling. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models, which can be pre-trained using advanced self-supervised methods. Extensive experiments show that it adapts well to various models and sizes, consistently achieving state-of-the-art performance in image classification, object detection, and semantic segmentation tasks. For example, in the Imagenet 1K dataset, HSKD improves the accuracy of Resnet-50 (sparse) from 76.98% to 80.01%.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Self-supervised learning with self-distillation on COVID-19 medical image classification
    Tan, Zhiyong
    Yu, Yuhai
    Meng, Jiana
    Liu, Shuang
    Li, Wei
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 243
  • [22] Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning
    Song, Kaiyou
    Xie, Jin
    Zhang, Shan
    Luo, Zimeng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11848 - 11857
  • [23] More from Less: Self-supervised Knowledge Distillation for Routine Histopathology Data
    Farndale, Lucas
    Insall, Robert
    Yuan, Ke
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 454 - 463
  • [24] Remote sensing image intelligent interpretation: from supervised learning to self-supervised learning
    Tao C.
    Yin Z.
    Zhu Q.
    Li H.
    Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2021, 50 (08): : 1122 - 1134
  • [25] Self2Self With Dropout: Learning Self-Supervised Denoising From Single Image
    Quan, Yuhui
    Chen, Mingqin
    Pang, Tongyao
    Ji, Hui
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1887 - 1895
  • [26] Auxiliary Learning for Self-Supervised Video Representation via Similarity-based Knowledge Distillation
    Dadashzadeh, Amirhossein
    Whone, Alan
    Mirmehdi, Majid
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4230 - 4239
  • [27] Self-Supervised Image Prior Learning with GMM from a Single Noisy Image
    Liu, Haosen
    Liu, Xuan
    Lu, Jiangbo
    Tan, Shan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2825 - 2834
  • [28] Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains
    Yang, Haiyang
    Tang, Shixiang
    Chen, Meilin
    Wang, Yizhou
    Zhu, Feng
    Bai, Lei
    Zhao, Rui
    Ouyang, Wanli
    COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 151 - 168
  • [29] A self-supervised guided knowledge distillation framework for unpaired low-dose CT image denoising
    Wang, Jiping
    Tang, Yufei
    Wu, Zhongyi
    Du, Qiang
    Yao, Libing
    Yang, Xiaodong
    Li, Ming
    Zheng, Jian
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2023, 107
  • [30] ATTENTION-GUIDED CONTRASTIVE MASKED IMAGE MODELING FOR TRANSFORMER-BASED SELF-SUPERVISED LEARNING
    Zhan, Yucheng
    Zhao, Yucheng
    Luo, Chong
    Zhang, Yueyi
    Sun, Xiaoyan
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2490 - 2494