Big2Small: Learning from masked image modelling with heterogeneous self-supervised knowledge distillation

被引：0

作者：

Wang, Ziming ^{[1
]}

Han, Shumin ^{[2
]}

Wang, Xiaodi ^{[2
]}

Hao, Jing ^{[2
]}

Cao, Xianbin ^{[1
]}

Zhang, Baochang ^{[3
,4
]}

机构：

[1] Beihang Univ, Sch Elect & Informat Engn, Beijing, Peoples R China

[2] Baidu Com, Beijing, Peoples R China

[3] Beihang Univ, Sch Artificial Intelligence, Beijing, Peoples R China

[4] Beihang Univ, Beijing, Peoples R China

来源：

IET CYBER-SYSTEMS AND ROBOTICS | 2024年 / 6卷 / 04期

关键词：

artificial intelligence; deep neural network; machine intelligence; machine learning; vision;

D O I：

10.1049/csy2.70002

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Small convolutional neural network (CNN)-based models usually require transferring knowledge from a large model before they are deployed in computationally resource-limited edge devices. Masked image modelling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. The reason is mainly due to the significant discrepancy between the transformer-based large model and the CNN-based small network. In this paper, the authors develop the first heterogeneous self-supervised knowledge distillation (HSKD) based on MIM, which can efficiently transfer knowledge from large transformer models to small CNN-based models in a self-supervised fashion. Our method builds a bridge between transformer-based models and CNNs by training a UNet-style student with sparse convolution, which can effectively mimic the visual representation inferred by a teacher over masked modelling. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models, which can be pre-trained using advanced self-supervised methods. Extensive experiments show that it adapts well to various models and sizes, consistently achieving state-of-the-art performance in image classification, object detection, and semantic segmentation tasks. For example, in the Imagenet 1K dataset, HSKD improves the accuracy of Resnet-50 (sparse) from 76.98% to 80.01%.

引用

页数：11

共 50 条

[31] A Novel Driver Distraction Behavior Detection Method Based on Self-Supervised Learning With Masked Image Modeling
Zhang, Yingzhi
Li, Taiguo
Li, Chao
Zhou, Xinghong
IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (04): : 6056 - 6071
[32] DisRot: boosting the generalization capability of few-shot learning via knowledge distillation and self-supervised learning
Ma, Chenyu
Jia, Jinfang
Huang, Jianqiang
Wu, Li
Wang, Xiaoying
MACHINE VISION AND APPLICATIONS, 2024, 35 (03)
[33] DeSD: Self-Supervised Learning with Deep Self-Distillation for 3D Medical Image Segmentation
Ye, Yiwen
Zhang, Jianpeng
Chen, Ziyang
Xia, Yong
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT IV, 2022, 13434 : 545 - 555
[34] Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification
Zhang, Chenrui
Peng, Yuxin
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1135 - 1141
[35] KD-MVS: Knowledge Distillation Based Self-supervised Learning for Multi-view Stereo
Ding, Yikang
Zhu, Qingtian
Liu, Xiangyue
Yuan, Wentao
Zhang, Haotian
Zhang, Chi
COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 630 - 646
[36] On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation
Yang, Gene-Ping
Gu, Yue
Tang, Qingming
Du, Dongsu
Liu, Yuzong
INTERSPEECH 2023, 2023, : 1623 - 1627
[37] Deep Unpaired Blind Image Super-Resolution Using Self-supervised Learning and Exemplar Distillation
Dong, Jiangxin
Bai, Haoran
Tang, Jinhui
Pan, Jinshan
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023,
[38] KNOWLEDGE DISTILLATION FOR NEURAL TRANSDUCERS FROM LARGE SELF-SUPERVISED PRE-TRAINED MODELS
Yang, Xiaoyu
Li, Qiujia
Woodland, Philip C.
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8527 - 8531
[39] Point-MPP: Point Cloud Self-Supervised Learning From Masked Position Prediction
Fan, Songlin
Gao, Wei
Li, Ge
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[40] Self-supervised learning based on StyleGAN for medical image classification on small labeled dataset
Fan, Zong
Wang, Zhimin
Zhang, Chaojie
Ozbey, Muzaffer
Villa, Umberto
Hao, Yao
Zhang, Zhongwei
Wang, Xiaowei
Lia, Hua
MEDICAL IMAGING 2024: IMAGE PROCESSING, 2024, 12926

← 1 2 3 4 5 →