Big2Small: Learning from masked image modelling with heterogeneous self-supervised knowledge distillation

被引：0

作者：

Wang, Ziming ^{[1
]}

Han, Shumin ^{[2
]}

Wang, Xiaodi ^{[2
]}

Hao, Jing ^{[2
]}

Cao, Xianbin ^{[1
]}

Zhang, Baochang ^{[3
,4
]}

机构：

[1] Beihang Univ, Sch Elect & Informat Engn, Beijing, Peoples R China

[2] Baidu Com, Beijing, Peoples R China

[3] Beihang Univ, Sch Artificial Intelligence, Beijing, Peoples R China

[4] Beihang Univ, Beijing, Peoples R China

来源：

IET CYBER-SYSTEMS AND ROBOTICS | 2024年 / 6卷 / 04期

关键词：

artificial intelligence; deep neural network; machine intelligence; machine learning; vision;

D O I：

10.1049/csy2.70002

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Small convolutional neural network (CNN)-based models usually require transferring knowledge from a large model before they are deployed in computationally resource-limited edge devices. Masked image modelling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. The reason is mainly due to the significant discrepancy between the transformer-based large model and the CNN-based small network. In this paper, the authors develop the first heterogeneous self-supervised knowledge distillation (HSKD) based on MIM, which can efficiently transfer knowledge from large transformer models to small CNN-based models in a self-supervised fashion. Our method builds a bridge between transformer-based models and CNNs by training a UNet-style student with sparse convolution, which can effectively mimic the visual representation inferred by a teacher over masked modelling. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models, which can be pre-trained using advanced self-supervised methods. Extensive experiments show that it adapts well to various models and sizes, consistently achieving state-of-the-art performance in image classification, object detection, and semantic segmentation tasks. For example, in the Imagenet 1K dataset, HSKD improves the accuracy of Resnet-50 (sparse) from 76.98% to 80.01%.

引用

页数：11

共 50 条

[21] Self-supervised learning with self-distillation on COVID-19 medical image classification
Tan, Zhiyong
Yu, Yuhai
Meng, Jiana
Liu, Shuang
Li, Wei
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 243
[22] Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning
Song, Kaiyou
Xie, Jin
Zhang, Shan
Luo, Zimeng
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11848 - 11857
[23] More from Less: Self-supervised Knowledge Distillation for Routine Histopathology Data
Farndale, Lucas
Insall, Robert
Yuan, Ke
MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 454 - 463
[24] Remote sensing image intelligent interpretation: from supervised learning to self-supervised learning
Tao C.
Yin Z.
Zhu Q.
Li H.
Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2021, 50 (08): : 1122 - 1134
[25] Self2Self With Dropout: Learning Self-Supervised Denoising From Single Image
Quan, Yuhui
Chen, Mingqin
Pang, Tongyao
Ji, Hui
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1887 - 1895
[26] Auxiliary Learning for Self-Supervised Video Representation via Similarity-based Knowledge Distillation
Dadashzadeh, Amirhossein
Whone, Alan
Mirmehdi, Majid
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4230 - 4239
[27] Self-Supervised Image Prior Learning with GMM from a Single Noisy Image
Liu, Haosen
Liu, Xuan
Lu, Jiangbo
Tan, Shan
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2825 - 2834
[28] Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains
Yang, Haiyang
Tang, Shixiang
Chen, Meilin
Wang, Yizhou
Zhu, Feng
Bai, Lei
Zhao, Rui
Ouyang, Wanli
COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 151 - 168
[29] A self-supervised guided knowledge distillation framework for unpaired low-dose CT image denoising
Wang, Jiping
Tang, Yufei
Wu, Zhongyi
Du, Qiang
Yao, Libing
Yang, Xiaodong
Li, Ming
Zheng, Jian
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2023, 107
[30] ATTENTION-GUIDED CONTRASTIVE MASKED IMAGE MODELING FOR TRANSFORMER-BASED SELF-SUPERVISED LEARNING
Zhan, Yucheng
Zhao, Yucheng
Luo, Chong
Zhang, Yueyi
Sun, Xiaoyan
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2490 - 2494

← 1 2 3 4 5 →