Decoupled Multi-teacher Knowledge Distillation based on Entropy

被引:0
|
作者
Cheng, Xin [1 ]
Tang, Jialiang [2 ]
Zhang, Zhiqiang [3 ]
Yu, Wenxin [3 ]
Jiang, Ning [3 ]
Zhou, Jinjia [1 ]
机构
[1] Hosei Univ, Grad Sch Sci & Engn, Tokyo, Japan
[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing, Peoples R China
[3] Southwest Univ Sci & Technol, Sch Comp Sci & Technol, Mianyang, Sichuan, Peoples R China
关键词
Multi-teacher knowledge distillation; image classification; entropy; deep learning;
D O I
10.1109/ISCAS58744.2024.10558141
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multi-teacher knowledge distillation (MKD) aims to leverage the valuable and diverse knowledge presented by multiple teacher networks to improve the performance of the student network. Existing approaches typically rely on simple methods such as averaging the prediction logits or using sub-optimal weighting strategies to combine knowledge from multiple teachers. However, employing these techniques cannot fully reflect the importance of teachers and may even mislead student's learning. To address these issues, we propose a novel Decoupled Multi teacher Knowledge Distillation based on Entropy (DE-MKD). DE-MKD decomposes the vanilla KD loss and assigns weights to each teacher to reflect its importance based on the entropy of their predictions. Furthermore, we extend the proposed approach to distill the intermediate features from teachers to further improve the performance of the student network. Extensive experiments conducted on the publicly available CIFAR-100 image classification dataset demonstrate the effectiveness and flexibility of our proposed approach.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Semi-supervised lung adenocarcinoma histopathology image classification based on multi-teacher knowledge distillation
    Wang, Qixuan
    Zhang, Yanjun
    Lu, Jun
    Li, Congsheng
    Zhang, Yungang
    PHYSICS IN MEDICINE AND BIOLOGY, 2024, 69 (18):
  • [32] Bi-Level Orthogonal Multi-Teacher Distillation
    Gong, Shuyue
    Wen, Weigang
    ELECTRONICS, 2024, 13 (16)
  • [33] MULTI-TEACHER KNOWLEDGE DISTILLATION FOR COMPRESSED VIDEO ACTION RECOGNITION ON DEEP NEURAL NETWORKS
    Wu, Meng-Chieh
    Chiu, Ching-Te
    Wu, Kun-Hsuan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2202 - 2206
  • [34] A Multi-teacher Knowledge Distillation Framework for Distantly Supervised Relation Extraction with Flexible Temperature
    Fei, Hongxiao
    Tan, Yangying
    Huang, Wenti
    Long, Jun
    Huang, Jincai
    Yang, Liu
    WEB AND BIG DATA, PT II, APWEB-WAIM 2023, 2024, 14332 : 103 - 116
  • [35] Adaptive multi-teacher softened relational knowledge distillation framework for payload mismatch in image steganalysis
    Yu, Lifang
    Li, Yunwei
    Weng, Shaowei
    Tian, Huawei
    Liu, Jing
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
  • [36] UNIC: Universal Classification Models via Multi-teacher Distillation
    Sariyildiz, Mert Bulent
    Weinzaepfel, Philippe
    Lucas, Thomas
    Larlus, Diane
    Kalantidis, Yannis
    COMPUTER VISION-ECCV 2024, PT IV, 2025, 15062 : 353 - 371
  • [37] Enhanced Accuracy and Robustness via Multi-teacher Adversarial Distillation
    Zhao, Shiji
    Yu, Jie
    Sun, Zhenlong
    Zhang, Bo
    Wei, Xingxing
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 585 - 602
  • [38] LGFA-MTKD: Enhancing Multi-Teacher Knowledge Distillation with Local and Global Frequency Attention
    Cheng, Xin
    Zhou, Jinjia
    INFORMATION, 2024, 15 (11)
  • [39] MT4MTL-KD: A Multi-Teacher Knowledge Distillation Framework for Triplet Recognition
    Gui, Shuangchun
    Wang, Zhenkun
    Chen, Jixiang
    Zhou, Xun
    Zhang, Chen
    Cao, Yi
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (04) : 1628 - 1639
  • [40] Multi-Teacher Distillation With Single Model for Neural Machine Translation
    Liang, Xiaobo
    Wu, Lijun
    Li, Juntao
    Qin, Tao
    Zhang, Min
    Liu, Tie-Yan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 992 - 1002