From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels

被引:24
|
作者
Yang, Zhendong [1 ,2 ]
Zeng, Ailing [2 ]
Li, Zhe [3 ]
Zhang, Tianke [1 ]
Yuan, Chun [1 ]
Li, Yu [2 ]
机构
[1] Chinese Acad Sci, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China
[2] Chinese Acad Sci, Int Digital Econ Acad IDEA, Beijing, Peoples R China
[3] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
10.1109/ICCV51070.2023.01576
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge Distillation (KD) uses the teacher's logits as soft labels to guide the student, while self-KD does not need a real teacher to require the soft labels. This work unifies the formulations of the two tasks by decomposing and reorganizing the generic KD loss into a Normalized KD (NKD) loss and customized soft labels for both target class (image's category) and non-target classes named Universal Self-KD (USKD). We decompose the KD loss and find the non-target loss from it forces the student's non-target logits to match the teacher's, but the sum of the two nontarget logits is different, preventing them from being identical. NKD normalizes the non-target logits to equalize their sum. It can be generally used for KD and self-KD to better use the soft labels for distillation. USKD generates customized soft labels for both target and non-target classes without a teacher. It smooths the target logit of the student as the soft target label and uses the rank of the intermediate feature to generate the soft non-target labels with Zipf's law. For KD with teachers, NKD achieves state-of-the-art performance on CIFAR-100 and ImageNet, boosting the ImageNet Top- 1 accuracy of Res-18 from 69.90% to 71.96% with a Res-34 teacher. For self-KD without teachers, USKD is the first method that can be effectively applied to both CNN and ViT models with negligible additional time and memory cost, resulting in new state-of-the-art results, such as 1.17% and 0.55% accuracy gains on ImageNet for MobileNet and DeiT-Tiny, respectively. Code is available at https://github.com/yzd-v/cls_KD.
引用
收藏
页码:17139 / 17148
页数:10
相关论文
共 50 条
  • [41] Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks
    Boo, Yoonho
    Shin, Sungho
    Choi, Jungwook
    Sung, Wonyong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6794 - 6802
  • [42] Self-Knowledge Distillation from Target-Embedding AutoEncoder for Multi-Label Classification
    Pan, Qizheng
    Yan, Ming
    Li, Guoqi
    Li, Jianmin
    Ma, Ying
    2022 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG), 2022, : 210 - 216
  • [43] Prototype-wise self-knowledge distillation for few-shot segmentation
    Chen, Yadang
    Xu, Xinyu
    Wei, Chenchen
    Lu, Chuhan
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2024, 129
  • [44] Weakly Supervised Referring Expression Grounding via Dynamic Self-Knowledge Distillation
    Mi, Jinpeng
    Chen, Zhiqian
    Zhang, Jianwei
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1254 - 1260
  • [45] Spatial likelihood voting with self-knowledge distillation for weakly supervised object detection
    Chen, Ze
    Fu, Zhihang
    Huang, Jianqiang
    Tao, Mingyuan
    Jiang, Rongxin
    Tian, Xiang
    Chen, Yaowu
    Hua, Xian-Sheng
    IMAGE AND VISION COMPUTING, 2021, 116
  • [46] Adversarial class-wise self-knowledge distillation for medical image segmentation
    Xiangchun Yu
    Jiaqing Shen
    Dingwen Zhang
    Jian Zheng
    Scientific Reports, 15 (1)
  • [47] Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation
    Ji, Mingi
    Shin, Seungjae
    Hwang, Seunghyun
    Park, Gibeom
    Moon, Il-Chul
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10659 - 10668
  • [48] Continuous Self-study: Scene Graph Generation with Self-knowledge Distillation and Spatial Augmentation
    Lv, Yuan
    Xu, Yajing
    Wang, Shusen
    Ma, Yingjian
    Wang, Dengke
    COMPUTER VISION - ACCV 2022, PT V, 2023, 13845 : 297 - 315
  • [49] Enhancing deep feature representation in self-knowledge distillation via pyramid feature refinement
    Yu, Hao
    Feng, Xin
    Wang, Yunlong
    PATTERN RECOGNITION LETTERS, 2024, 178 : 35 - 42
  • [50] Knowledge Distillation With Feature Self Attention
    Park, Sin-Gu
    Kang, Dong-Joong
    IEEE ACCESS, 2023, 11 : 34554 - 34562