Big2Small: Learning from masked image modelling with heterogeneous self-supervised knowledge distillation

被引:0
|
作者
Wang, Ziming [1 ]
Han, Shumin [2 ]
Wang, Xiaodi [2 ]
Hao, Jing [2 ]
Cao, Xianbin [1 ]
Zhang, Baochang [3 ,4 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, Beijing, Peoples R China
[2] Baidu Com, Beijing, Peoples R China
[3] Beihang Univ, Sch Artificial Intelligence, Beijing, Peoples R China
[4] Beihang Univ, Beijing, Peoples R China
关键词
artificial intelligence; deep neural network; machine intelligence; machine learning; vision;
D O I
10.1049/csy2.70002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Small convolutional neural network (CNN)-based models usually require transferring knowledge from a large model before they are deployed in computationally resource-limited edge devices. Masked image modelling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. The reason is mainly due to the significant discrepancy between the transformer-based large model and the CNN-based small network. In this paper, the authors develop the first heterogeneous self-supervised knowledge distillation (HSKD) based on MIM, which can efficiently transfer knowledge from large transformer models to small CNN-based models in a self-supervised fashion. Our method builds a bridge between transformer-based models and CNNs by training a UNet-style student with sparse convolution, which can effectively mimic the visual representation inferred by a teacher over masked modelling. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models, which can be pre-trained using advanced self-supervised methods. Extensive experiments show that it adapts well to various models and sizes, consistently achieving state-of-the-art performance in image classification, object detection, and semantic segmentation tasks. For example, in the Imagenet 1K dataset, HSKD improves the accuracy of Resnet-50 (sparse) from 76.98% to 80.01%.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] A Novel Driver Distraction Behavior Detection Method Based on Self-Supervised Learning With Masked Image Modeling
    Zhang, Yingzhi
    Li, Taiguo
    Li, Chao
    Zhou, Xinghong
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (04): : 6056 - 6071
  • [32] DisRot: boosting the generalization capability of few-shot learning via knowledge distillation and self-supervised learning
    Ma, Chenyu
    Jia, Jinfang
    Huang, Jianqiang
    Wu, Li
    Wang, Xiaoying
    MACHINE VISION AND APPLICATIONS, 2024, 35 (03)
  • [33] DeSD: Self-Supervised Learning with Deep Self-Distillation for 3D Medical Image Segmentation
    Ye, Yiwen
    Zhang, Jianpeng
    Chen, Ziyang
    Xia, Yong
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT IV, 2022, 13434 : 545 - 555
  • [34] Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification
    Zhang, Chenrui
    Peng, Yuxin
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1135 - 1141
  • [35] KD-MVS: Knowledge Distillation Based Self-supervised Learning for Multi-view Stereo
    Ding, Yikang
    Zhu, Qingtian
    Liu, Xiangyue
    Yuan, Wentao
    Zhang, Haotian
    Zhang, Chi
    COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 630 - 646
  • [36] On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation
    Yang, Gene-Ping
    Gu, Yue
    Tang, Qingming
    Du, Dongsu
    Liu, Yuzong
    INTERSPEECH 2023, 2023, : 1623 - 1627
  • [37] Deep Unpaired Blind Image Super-Resolution Using Self-supervised Learning and Exemplar Distillation
    Dong, Jiangxin
    Bai, Haoran
    Tang, Jinhui
    Pan, Jinshan
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023,
  • [38] KNOWLEDGE DISTILLATION FOR NEURAL TRANSDUCERS FROM LARGE SELF-SUPERVISED PRE-TRAINED MODELS
    Yang, Xiaoyu
    Li, Qiujia
    Woodland, Philip C.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8527 - 8531
  • [39] Point-MPP: Point Cloud Self-Supervised Learning From Masked Position Prediction
    Fan, Songlin
    Gao, Wei
    Li, Ge
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [40] Self-supervised learning based on StyleGAN for medical image classification on small labeled dataset
    Fan, Zong
    Wang, Zhimin
    Zhang, Chaojie
    Ozbey, Muzaffer
    Villa, Umberto
    Hao, Yao
    Zhang, Zhongwei
    Wang, Xiaowei
    Lia, Hua
    MEDICAL IMAGING 2024: IMAGE PROCESSING, 2024, 12926