From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels

被引:24
|
作者
Yang, Zhendong [1 ,2 ]
Zeng, Ailing [2 ]
Li, Zhe [3 ]
Zhang, Tianke [1 ]
Yuan, Chun [1 ]
Li, Yu [2 ]
机构
[1] Chinese Acad Sci, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China
[2] Chinese Acad Sci, Int Digital Econ Acad IDEA, Beijing, Peoples R China
[3] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
10.1109/ICCV51070.2023.01576
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge Distillation (KD) uses the teacher's logits as soft labels to guide the student, while self-KD does not need a real teacher to require the soft labels. This work unifies the formulations of the two tasks by decomposing and reorganizing the generic KD loss into a Normalized KD (NKD) loss and customized soft labels for both target class (image's category) and non-target classes named Universal Self-KD (USKD). We decompose the KD loss and find the non-target loss from it forces the student's non-target logits to match the teacher's, but the sum of the two nontarget logits is different, preventing them from being identical. NKD normalizes the non-target logits to equalize their sum. It can be generally used for KD and self-KD to better use the soft labels for distillation. USKD generates customized soft labels for both target and non-target classes without a teacher. It smooths the target logit of the student as the soft target label and uses the rank of the intermediate feature to generate the soft non-target labels with Zipf's law. For KD with teachers, NKD achieves state-of-the-art performance on CIFAR-100 and ImageNet, boosting the ImageNet Top- 1 accuracy of Res-18 from 69.90% to 71.96% with a Res-34 teacher. For self-KD without teachers, USKD is the first method that can be effectively applied to both CNN and ViT models with negligible additional time and memory cost, resulting in new state-of-the-art results, such as 1.17% and 0.55% accuracy gains on ImageNet for MobileNet and DeiT-Tiny, respectively. Code is available at https://github.com/yzd-v/cls_KD.
引用
收藏
页码:17139 / 17148
页数:10
相关论文
共 50 条
  • [1] Self-knowledge distillation based on knowledge transfer from soft to hard examples
    Tang, Yuan
    Chen, Ying
    Xie, Linbo
    IMAGE AND VISION COMPUTING, 2023, 135
  • [2] Neighbor self-knowledge distillation
    Liang, Peng
    Zhang, Weiwei
    Wang, Junhuang
    Guo, Yufeng
    INFORMATION SCIENCES, 2024, 654
  • [3] Self-knowledge distillation with dimensional history knowledge
    Wenke Huang
    Mang Ye
    Zekun Shi
    He Li
    Bo Du
    Science China Information Sciences, 2025, 68 (9)
  • [4] Self-knowledge distillation via dropout
    Lee, Hyoje
    Park, Yeachan
    Seo, Hyun
    Kang, Myungjoo
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
  • [5] Dual teachers for self-knowledge distillation
    Li, Zheng
    Li, Xiang
    Yang, Lingfeng
    Song, Renjie
    Yang, Jian
    Pan, Zhigeng
    PATTERN RECOGNITION, 2024, 151
  • [6] Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation
    Yin, Zimo
    Pu, Jian
    Zhou, Yijie
    Xue, Xiangyang
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (11) : 2270 - 2283
  • [7] Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation
    Zimo Yin
    Jian Pu
    Yijie Zhou
    Xiangyang Xue
    IEEE/CAA Journal of Automatica Sinica, 2024, 11 (11) : 2270 - 2283
  • [8] Teaching Yourself: A Self-Knowledge Distillation Approach to Action Recognition
    Duc-Quang Vu
    Le, Ngan
    Wang, Jia-Ching
    IEEE ACCESS, 2021, 9 : 105711 - 105723
  • [9] Sliding Cross Entropy for Self-Knowledge Distillation
    Lee, Hanbeen
    Kim, Jeongho
    Woo, Simon S.
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1044 - 1053
  • [10] Self-Knowledge Distillation with Progressive Refinement of Targets
    Kim, Kyungyul
    Ji, ByeongMoon
    Yoon, Doyoung
    Hwang, Sangheum
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6547 - 6556