Big2Small: Learning from masked image modelling with heterogeneous self-supervised knowledge distillation

被引：0

作者：

Wang, Ziming ^{[1
]}

Han, Shumin ^{[2
]}

Wang, Xiaodi ^{[2
]}

Hao, Jing ^{[2
]}

Cao, Xianbin ^{[1
]}

Zhang, Baochang ^{[3
,4
]}

机构：

[1] Beihang Univ, Sch Elect & Informat Engn, Beijing, Peoples R China

[2] Baidu Com, Beijing, Peoples R China

[3] Beihang Univ, Sch Artificial Intelligence, Beijing, Peoples R China

[4] Beihang Univ, Beijing, Peoples R China

来源：

IET CYBER-SYSTEMS AND ROBOTICS | 2024年 / 6卷 / 04期

关键词：

artificial intelligence; deep neural network; machine intelligence; machine learning; vision;

D O I：

10.1049/csy2.70002

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Small convolutional neural network (CNN)-based models usually require transferring knowledge from a large model before they are deployed in computationally resource-limited edge devices. Masked image modelling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. The reason is mainly due to the significant discrepancy between the transformer-based large model and the CNN-based small network. In this paper, the authors develop the first heterogeneous self-supervised knowledge distillation (HSKD) based on MIM, which can efficiently transfer knowledge from large transformer models to small CNN-based models in a self-supervised fashion. Our method builds a bridge between transformer-based models and CNNs by training a UNet-style student with sparse convolution, which can effectively mimic the visual representation inferred by a teacher over masked modelling. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models, which can be pre-trained using advanced self-supervised methods. Extensive experiments show that it adapts well to various models and sizes, consistently achieving state-of-the-art performance in image classification, object detection, and semantic segmentation tasks. For example, in the Imagenet 1K dataset, HSKD improves the accuracy of Resnet-50 (sparse) from 76.98% to 80.01%.

引用

页数：11

共 50 条

[1] Image quality assessment based on self-supervised learning and knowledge distillation
Sang, Qingbing
Shu, Ziru
Liu, Lixiong
Hu, Cong
Wu, Qin
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 90
[2] Self-supervised knowledge distillation in counterfactual learning for VQA
Bi, Yandong
Jiang, Huajie
Zhang, Hanfu
Hu, Yongli
Yin, Baocai
PATTERN RECOGNITION LETTERS, 2024, 177 : 33 - 39
[3] Self-supervised knowledge distillation for complementary label learning
Liu, Jiabin
Li, Biao
Lei, Minglong
Shi, Yong
NEURAL NETWORKS, 2022, 155 : 318 - 327
[4] Self-supervised heterogeneous graph learning with iterative similarity distillation
Wang, Tianfeng
Pan, Zhisong
Hu, Guyu
Xu, Kun
Zhang, Yao
KNOWLEDGE-BASED SYSTEMS, 2023, 276
[5] Remote Sensing Image Scene Classification via Self-Supervised Learning and Knowledge Distillation
Zhao, Yibo
Liu, Jianjun
Yang, Jinlong
Wu, Zebin
REMOTE SENSING, 2022, 14 (19)
[6] Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Wang, Rui
Chen, Dongdong
Wu, Zuxuan
Chen, Yinpeng
Dai, Xiyang
Liu, Mengchen
Yuan, Lu
Jiang, Yu-Gang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6312 - 6322
[7] Distill on the Go: Online knowledge distillation in self-supervised learning
Bhat, Prashant
Arani, Elahe
Zonooz, Bahram
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2672 - 2681
[8] Self-Supervised Learning With Adaptive Distillation for Hyperspectral Image Classification
Yue, Jun
Fang, Leyuan
Rahmani, Hossein
Ghamisi, Pedram
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[9] A Novel Knowledge Distillation Method for Self-Supervised Hyperspectral Image Classification
Chi, Qiang
Lv, Guohua
Zhao, Guixin
Dong, Xiangjun
REMOTE SENSING, 2022, 14 (18)
[10] Self-Supervised Contrastive Learning for Camera-to-Radar Knowledge Distillation
Wang, Wenpeng
Campbell, Bradford
Munir, Sirajum
2024 20TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SMART SYSTEMS AND THE INTERNET OF THINGS, DCOSS-IOT 2024, 2024, : 154 - 161

← 1 2 3 4 5 →