Big2Small: Learning from masked image modelling with heterogeneous self-supervised knowledge distillation

被引:0
|
作者
Wang, Ziming [1 ]
Han, Shumin [2 ]
Wang, Xiaodi [2 ]
Hao, Jing [2 ]
Cao, Xianbin [1 ]
Zhang, Baochang [3 ,4 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, Beijing, Peoples R China
[2] Baidu Com, Beijing, Peoples R China
[3] Beihang Univ, Sch Artificial Intelligence, Beijing, Peoples R China
[4] Beihang Univ, Beijing, Peoples R China
关键词
artificial intelligence; deep neural network; machine intelligence; machine learning; vision;
D O I
10.1049/csy2.70002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Small convolutional neural network (CNN)-based models usually require transferring knowledge from a large model before they are deployed in computationally resource-limited edge devices. Masked image modelling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. The reason is mainly due to the significant discrepancy between the transformer-based large model and the CNN-based small network. In this paper, the authors develop the first heterogeneous self-supervised knowledge distillation (HSKD) based on MIM, which can efficiently transfer knowledge from large transformer models to small CNN-based models in a self-supervised fashion. Our method builds a bridge between transformer-based models and CNNs by training a UNet-style student with sparse convolution, which can effectively mimic the visual representation inferred by a teacher over masked modelling. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models, which can be pre-trained using advanced self-supervised methods. Extensive experiments show that it adapts well to various models and sizes, consistently achieving state-of-the-art performance in image classification, object detection, and semantic segmentation tasks. For example, in the Imagenet 1K dataset, HSKD improves the accuracy of Resnet-50 (sparse) from 76.98% to 80.01%.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Image quality assessment based on self-supervised learning and knowledge distillation
    Sang, Qingbing
    Shu, Ziru
    Liu, Lixiong
    Hu, Cong
    Wu, Qin
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 90
  • [2] Self-supervised knowledge distillation in counterfactual learning for VQA
    Bi, Yandong
    Jiang, Huajie
    Zhang, Hanfu
    Hu, Yongli
    Yin, Baocai
    PATTERN RECOGNITION LETTERS, 2024, 177 : 33 - 39
  • [3] Self-supervised knowledge distillation for complementary label learning
    Liu, Jiabin
    Li, Biao
    Lei, Minglong
    Shi, Yong
    NEURAL NETWORKS, 2022, 155 : 318 - 327
  • [4] Self-supervised heterogeneous graph learning with iterative similarity distillation
    Wang, Tianfeng
    Pan, Zhisong
    Hu, Guyu
    Xu, Kun
    Zhang, Yao
    KNOWLEDGE-BASED SYSTEMS, 2023, 276
  • [5] Remote Sensing Image Scene Classification via Self-Supervised Learning and Knowledge Distillation
    Zhao, Yibo
    Liu, Jianjun
    Yang, Jinlong
    Wu, Zebin
    REMOTE SENSING, 2022, 14 (19)
  • [6] Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
    Wang, Rui
    Chen, Dongdong
    Wu, Zuxuan
    Chen, Yinpeng
    Dai, Xiyang
    Liu, Mengchen
    Yuan, Lu
    Jiang, Yu-Gang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6312 - 6322
  • [7] Distill on the Go: Online knowledge distillation in self-supervised learning
    Bhat, Prashant
    Arani, Elahe
    Zonooz, Bahram
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2672 - 2681
  • [8] Self-Supervised Learning With Adaptive Distillation for Hyperspectral Image Classification
    Yue, Jun
    Fang, Leyuan
    Rahmani, Hossein
    Ghamisi, Pedram
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [9] A Novel Knowledge Distillation Method for Self-Supervised Hyperspectral Image Classification
    Chi, Qiang
    Lv, Guohua
    Zhao, Guixin
    Dong, Xiangjun
    REMOTE SENSING, 2022, 14 (18)
  • [10] Self-Supervised Contrastive Learning for Camera-to-Radar Knowledge Distillation
    Wang, Wenpeng
    Campbell, Bradford
    Munir, Sirajum
    2024 20TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SMART SYSTEMS AND THE INTERNET OF THINGS, DCOSS-IOT 2024, 2024, : 154 - 161