From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels

被引:24
|
作者
Yang, Zhendong [1 ,2 ]
Zeng, Ailing [2 ]
Li, Zhe [3 ]
Zhang, Tianke [1 ]
Yuan, Chun [1 ]
Li, Yu [2 ]
机构
[1] Chinese Acad Sci, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China
[2] Chinese Acad Sci, Int Digital Econ Acad IDEA, Beijing, Peoples R China
[3] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
10.1109/ICCV51070.2023.01576
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge Distillation (KD) uses the teacher's logits as soft labels to guide the student, while self-KD does not need a real teacher to require the soft labels. This work unifies the formulations of the two tasks by decomposing and reorganizing the generic KD loss into a Normalized KD (NKD) loss and customized soft labels for both target class (image's category) and non-target classes named Universal Self-KD (USKD). We decompose the KD loss and find the non-target loss from it forces the student's non-target logits to match the teacher's, but the sum of the two nontarget logits is different, preventing them from being identical. NKD normalizes the non-target logits to equalize their sum. It can be generally used for KD and self-KD to better use the soft labels for distillation. USKD generates customized soft labels for both target and non-target classes without a teacher. It smooths the target logit of the student as the soft target label and uses the rank of the intermediate feature to generate the soft non-target labels with Zipf's law. For KD with teachers, NKD achieves state-of-the-art performance on CIFAR-100 and ImageNet, boosting the ImageNet Top- 1 accuracy of Res-18 from 69.90% to 71.96% with a Res-34 teacher. For self-KD without teachers, USKD is the first method that can be effectively applied to both CNN and ViT models with negligible additional time and memory cost, resulting in new state-of-the-art results, such as 1.17% and 0.55% accuracy gains on ImageNet for MobileNet and DeiT-Tiny, respectively. Code is available at https://github.com/yzd-v/cls_KD.
引用
收藏
页码:17139 / 17148
页数:10
相关论文
共 50 条
  • [21] Improving Knowledge Distillation With a Customized Teacher
    Tan, Chao
    Liu, Jie
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2290 - 2299
  • [22] SELF-KNOWLEDGE DISTILLATION VIA FEATURE ENHANCEMENT FOR SPEAKER VERIFICATION
    Liu, Bei
    Wang, Haoyu
    Chen, Zhengyang
    Wang, Shuai
    Qian, Yanmin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7542 - 7546
  • [23] Adaptive lightweight network construction method for Self-Knowledge Distillation
    Lu, Siyuan
    Zeng, Weiliang
    Li, Xueshi
    Ou, Jiajun
    NEUROCOMPUTING, 2025, 624
  • [24] Personalized Edge Intelligence via Federated Self-Knowledge Distillation
    Jin, Hai
    Bai, Dongshan
    Yao, Dezhong
    Dai, Yutong
    Gu, Lin
    Yu, Chen
    Sun, Lichao
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (02) : 567 - 580
  • [25] Self-Knowledge Distillation for First Trimester Ultrasound Saliency Prediction
    Gridach, Mourad
    Savochkina, Elizaveta
    Drukker, Lior
    Papageorghiou, Aris T.
    Noble, J. Alison
    SIMPLIFYING MEDICAL ULTRASOUND, ASMUS 2022, 2022, 13565 : 117 - 127
  • [26] Automatic Diabetic Retinopathy Grading via Self-Knowledge Distillation
    Luo, Ling
    Xue, Dingyu
    Feng, Xinglong
    ELECTRONICS, 2020, 9 (09) : 1 - 13
  • [27] Decoupled Feature and Self-Knowledge Distillation for Speech Emotion Recognition
    Yu, Haixiang
    Ning, Yuan
    IEEE ACCESS, 2025, 13 : 33275 - 33285
  • [28] Dealing with partial labels by knowledge distillation
    Wang, Guangtai
    Huang, Jintao
    Lai, Yiqiang
    Vong, Chi-Man
    PATTERN RECOGNITION, 2025, 158
  • [29] Triplet Loss for Knowledge Distillation
    Oki, Hideki
    Abe, Motoshi
    Miyao, Jyunichi
    Kurita, Takio
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [30] Enhanced ProtoNet With Self-Knowledge Distillation for Few-Shot Learning
    Habib, Mohamed El Hacen
    Kucukmanisa, Ayhan
    Urhan, Oguzhan
    IEEE ACCESS, 2024, 12 : 145331 - 145340