Big2Small: Learning from masked image modelling with heterogeneous self-supervised knowledge distillation

被引:0
|
作者
Wang, Ziming [1 ]
Han, Shumin [2 ]
Wang, Xiaodi [2 ]
Hao, Jing [2 ]
Cao, Xianbin [1 ]
Zhang, Baochang [3 ,4 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, Beijing, Peoples R China
[2] Baidu Com, Beijing, Peoples R China
[3] Beihang Univ, Sch Artificial Intelligence, Beijing, Peoples R China
[4] Beihang Univ, Beijing, Peoples R China
关键词
artificial intelligence; deep neural network; machine intelligence; machine learning; vision;
D O I
10.1049/csy2.70002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Small convolutional neural network (CNN)-based models usually require transferring knowledge from a large model before they are deployed in computationally resource-limited edge devices. Masked image modelling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. The reason is mainly due to the significant discrepancy between the transformer-based large model and the CNN-based small network. In this paper, the authors develop the first heterogeneous self-supervised knowledge distillation (HSKD) based on MIM, which can efficiently transfer knowledge from large transformer models to small CNN-based models in a self-supervised fashion. Our method builds a bridge between transformer-based models and CNNs by training a UNet-style student with sparse convolution, which can effectively mimic the visual representation inferred by a teacher over masked modelling. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models, which can be pre-trained using advanced self-supervised methods. Extensive experiments show that it adapts well to various models and sizes, consistently achieving state-of-the-art performance in image classification, object detection, and semantic segmentation tasks. For example, in the Imagenet 1K dataset, HSKD improves the accuracy of Resnet-50 (sparse) from 76.98% to 80.01%.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] SELF-KNOWLEDGE DISTILLATION BASED SELF-SUPERVISED LEARNING FOR COVID-19 DETECTION FROM CHEST X-RAY IMAGES
    Li, Guang
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1371 - 1375
  • [42] Noisy-as-Clean: Learning Self-Supervised Denoising From Corrupted Image
    Xu, Jun
    Huang, Yuan
    Cheng, Ming-Ming
    Liu, Li
    Zhu, Fan
    Xu, Zhou
    Shao, Ling
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 9316 - 9329
  • [43] MV-MR: Multi-Views and Multi-Representations for Self-Supervised Learning and Knowledge Distillation
    Kinakh, Vitaliy
    Drozdova, Mariia
    Voloshynovskiy, Slava
    ENTROPY, 2024, 26 (06)
  • [44] ViC-MAE: Self-supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
    Hernandez, Jefferson
    Villegas, Ruben
    Ordonez, Vicente
    COMPUTER VISION-ECCV 2024, PT IV, 2025, 15062 : 444 - 463
  • [45] Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms
    Ma, Shihan
    Yang, Jidong J.
    ENG, 2023, 4 (01): : 444 - 456
  • [46] Self-Supervised Learning for Seismic Image Segmentation From Few-Labeled Samples
    Monteiro, Bruno A. A.
    Oliveira, Hugo
    dos Santos, Jefersson A.
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [47] Self-Supervised Learning Application on COVID-19 Chest X-ray Image Classification Using Masked AutoEncoder
    Xing, Xin
    Liang, Gongbo
    Wang, Chris
    Jacobs, Nathan
    Lin, Ai-Ling
    BIOENGINEERING-BASEL, 2023, 10 (08):
  • [48] Self-Supervised Learning with Masked Image Modeling for Teeth Numbering, Detection of Dental Restorations, and Instance Segmentation in Dental Panoramic Radiographs
    Almalki, Amani
    Latecki, Longin Jan
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5583 - 5592
  • [49] KnowMIM: a Self-supervised Pre-training Framework Based on Knowledge-Guided Masked Image Modeling for Retinal Vessel Segmentation
    Zhu, Jiuyuan
    Chen, Wei
    Li, Chen
    Xun, Tianci
    Tan, Chunjiao
    Zheng, Weiwei
    Xu, Yingqi
    Qiao, Peng
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VIII, 2024, 15023 : 412 - 426
  • [50] Self-supervised pre-training with contrastive and masked autoencoder methods for dealing with small datasets in deep learning for medical imaging
    Wolf, Daniel
    Payer, Tristan
    Lisson, Catharina Silvia
    Lisson, Christoph Gerhard
    Beer, Meinrad
    Gotz, Michael
    Ropinski, Timo
    SCIENTIFIC REPORTS, 2023, 13 (01)