From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels

被引:24
|
作者
Yang, Zhendong [1 ,2 ]
Zeng, Ailing [2 ]
Li, Zhe [3 ]
Zhang, Tianke [1 ]
Yuan, Chun [1 ]
Li, Yu [2 ]
机构
[1] Chinese Acad Sci, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China
[2] Chinese Acad Sci, Int Digital Econ Acad IDEA, Beijing, Peoples R China
[3] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
10.1109/ICCV51070.2023.01576
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge Distillation (KD) uses the teacher's logits as soft labels to guide the student, while self-KD does not need a real teacher to require the soft labels. This work unifies the formulations of the two tasks by decomposing and reorganizing the generic KD loss into a Normalized KD (NKD) loss and customized soft labels for both target class (image's category) and non-target classes named Universal Self-KD (USKD). We decompose the KD loss and find the non-target loss from it forces the student's non-target logits to match the teacher's, but the sum of the two nontarget logits is different, preventing them from being identical. NKD normalizes the non-target logits to equalize their sum. It can be generally used for KD and self-KD to better use the soft labels for distillation. USKD generates customized soft labels for both target and non-target classes without a teacher. It smooths the target logit of the student as the soft target label and uses the rank of the intermediate feature to generate the soft non-target labels with Zipf's law. For KD with teachers, NKD achieves state-of-the-art performance on CIFAR-100 and ImageNet, boosting the ImageNet Top- 1 accuracy of Res-18 from 69.90% to 71.96% with a Res-34 teacher. For self-KD without teachers, USKD is the first method that can be effectively applied to both CNN and ViT models with negligible additional time and memory cost, resulting in new state-of-the-art results, such as 1.17% and 0.55% accuracy gains on ImageNet for MobileNet and DeiT-Tiny, respectively. Code is available at https://github.com/yzd-v/cls_KD.
引用
收藏
页码:17139 / 17148
页数:10
相关论文
共 50 条
  • [31] Knowledge Augmentation for Distillation: A General and Effective Approach to Enhance Knowledge Distillation
    Tang, Yinan
    Guo, Zhenhua
    Wang, Li
    Fan, Baoyu
    Cao, Fang
    Gao, Kai
    Zhang, Hongwei
    Li, Rengang
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON EFFICIENT MULTIMEDIA COMPUTING UNDER LIMITED RESOURCES, EMCLR 2024, 2024, : 23 - 31
  • [32] Uncertainty Driven Adaptive Self-Knowledge Distillation for Medical Image Segmentation
    Guo, Xutao
    Wang, Mengqi
    Xiang, Yang
    Yang, Yanwu
    Ye, Chenfei
    Wang, Haijun
    Ma, Ting
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2025,
  • [33] Training a thin and shallow lane detection network with self-knowledge distillation
    Dai, Xuerui
    Yuan, Xue
    Wei, Xueye
    JOURNAL OF ELECTRONIC IMAGING, 2021, 30 (01)
  • [34] A Lightweight Convolution Network with Self-Knowledge Distillation for Hyperspectral Image Classification
    Xu, Hao
    Cao, Guo
    Deng, Lindiao
    Ding, Lanwei
    Xu, Ling
    Pan, Qikun
    Shang, Yanfeng
    FOURTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING, ICGIP 2022, 2022, 12705
  • [35] Self-knowledge distillation enhanced binary neural networks derived from underutilized information
    Zeng, Kai
    Wan, Zixin
    Gu, Hongwei
    Shen, Tao
    APPLIED INTELLIGENCE, 2024, 54 (06) : 4994 - 5014
  • [36] A Novel Self-Knowledge Distillation Approach w h Siamese Representation Learning for Action Recognition
    Vu, Duc-Quang
    Thi-Thu-Trang Phung
    Wang, Jia-Ching
    2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [37] Subclass Knowledge Distillation with Known Subclass Labels
    Sajedi, Ahmad
    Lawryshyn, Yuri A.
    Plataniotis, Konstantinos N.
    2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2022,
  • [38] Knowledge Distillation from Single to Multi Labels: an Empirical Study
    Zhang, Youcai
    Qin, Yuzhuo
    Liu, Hengwei
    Zhang, Yanhao
    Li, Yaqian
    Gu, Xiaodong
    arXiv, 2023,
  • [39] AI-KD: Adversarial learning and Implicit regularization for self-Knowledge Distillation
    Kim, Hyungmin
    Suh, Sungho
    Baek, Sunghyun
    Kim, Daehwan
    Jeong, Daun
    Cho, Hansang
    Kim, Junmo
    KNOWLEDGE-BASED SYSTEMS, 2024, 293
  • [40] Lightweight Human Pose Estimation Based on Densely Guided Self-Knowledge Distillation
    Wu, Mingyue
    Zhao, Zhong-Qiu
    Li, Jiajun
    Tian, Weidong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT II, 2023, 14255 : 421 - 433