DE-MKD: Decoupled Multi-Teacher Knowledge Distillation Based on Entropy

被引：2

作者：

Cheng, Xin ^{[1
]}

Zhang, Zhiqiang ^{[2
]}

Weng, Wei ^{[3
]}

Yu, Wenxin ^{[2
]}

Zhou, Jinjia ^{[1
]}

机构：

[1] Hosei Univ, Grad Sch Sci & Engn, Tokyo 1848584, Japan

[2] Southwest Univ Sci & Technol, Sch Sci & Technol, Mianyang 621010, Peoples R China

[3] Kanazawa Univ, Inst Liberal Arts & Sci, Kanazawa 9201192, Japan

来源：

MATHEMATICS | 2024年 / 12卷 / 11期

关键词：

multi-teacher knowledge distillation; image classification; entropy; deep learning;

D O I：

10.3390/math12111672

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

The complexity of deep neural network models (DNNs) severely limits their application on devices with limited computing and storage resources. Knowledge distillation (KD) is an attractive model compression technology that can effectively alleviate this problem. Multi-teacher knowledge distillation (MKD) aims to leverage the valuable and diverse knowledge distilled by multiple teacher networks to improve the performance of the student network. Existing approaches typically rely on simple methods such as averaging the prediction logits or using sub-optimal weighting strategies to fuse distilled knowledge from multiple teachers. However, employing these techniques cannot fully reflect the importance of teachers and may even mislead student's learning. To address this issue, we propose a novel Decoupled Multi-Teacher Knowledge Distillation based on Entropy (DE-MKD). DE-MKD decouples the vanilla knowledge distillation loss and assigns adaptive weights to each teacher to reflect its importance based on the entropy of their predictions. Furthermore, we extend the proposed approach to distill the intermediate features from multiple powerful but cumbersome teachers to improve the performance of the lightweight student network. Extensive experiments on the publicly available CIFAR-100 image classification benchmark dataset with various teacher-student network pairs demonstrated the effectiveness and flexibility of our approach. For instance, the VGG8|ShuffleNetV2 model trained by DE-MKD reached 75.25%|78.86% top-one accuracy when choosing VGG13|WRN40-2 as the teacher, setting new performance records. In addition, surprisingly, the distilled student model outperformed the teacher in both teacher-student network pairs.

引用

页数：10

共 50 条

[1] Decoupled Multi-teacher Knowledge Distillation based on Entropy
Cheng, Xin
Tang, Jialiang
Zhang, Zhiqiang
Yu, Wenxin
Jiang, Ning
Zhou, Jinjia
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[2] Anomaly detection based on multi-teacher knowledge distillation
Ma, Ye
Jiang, Xu
Guan, Nan
Yi, Wang
JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 138
[3] Correlation Guided Multi-teacher Knowledge Distillation
Shi, Luyao
Jiang, Ning
Tang, Jialiang
Huang, Xinlei
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 562 - 574
[4] Reinforced Multi-Teacher Selection for Knowledge Distillation
Yuan, Fei
Shou, Linjun
Pei, Jian
Lin, Wutao
Gong, Ming
Fu, Yan
Jiang, Daxin
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14284 - 14291
[5] Knowledge Distillation via Multi-Teacher Feature Ensemble
Ye, Xin
Jiang, Rongxin
Tian, Xiang
Zhang, Rui
Chen, Yaowu
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 566 - 570
[6] CONFIDENCE-AWARE MULTI-TEACHER KNOWLEDGE DISTILLATION
Zhang, Hailin
Chen, Defang
Wang, Can
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4498 - 4502
[7] Adaptive multi-teacher multi-level knowledge distillation
Liu, Yuang
Zhang, Wei
Wang, Jun
Neurocomputing, 2021, 415 : 106 - 113
[8] Adaptive multi-teacher multi-level knowledge distillation
Liu, Yuang
Zhang, Wei
Wang, Jun
NEUROCOMPUTING, 2020, 415 : 106 - 113
[9] Knowledge Distillation via Multi-Teacher Feature Ensemble
Ye, Xin
Jiang, Rongxin
Tian, Xiang
Zhang, Rui
Chen, Yaowu
IEEE Signal Processing Letters, 2024, 31 : 566 - 570
[10] Robust Semantic Segmentation With Multi-Teacher Knowledge Distillation
Amirkhani, Abdollah
Khosravian, Amir
Masih-Tehrani, Masoud
Kashiani, Hossein
IEEE ACCESS, 2021, 9 : 119049 - 119066

← 1 2 3 4 5 →