Self-Knowledge Distillation via Progressive Associative Learning

被引:1
|
作者
Zhao, Haoran [1 ]
Bi, Yanxian [2 ]
Tian, Shuwen [1 ]
Wang, Jian [3 ]
Zhang, Peiying [4 ]
Deng, Zhaopeng [1 ]
Liu, Kai [5 ,6 ]
机构
[1] Qingdao Univ Technol, Sch Informat & Control Engn, Qingdao 266520, Peoples R China
[2] CETC Acad Elect & Informat Technol Grp Co Ltd, China Acad Elect & Informat Technol, Beijing 100041, Peoples R China
[3] China Univ Petr East China, Coll Sci, Qingdao 266580, Peoples R China
[4] China Univ Petr East China, Qingdao Inst Software, Coll Comp Sci & Technol, Qingdao 266580, Peoples R China
[5] Tsinghua Univ, State Key Lab Space Network & Commun, Beijing 100084, Peoples R China
[6] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100084, Peoples R China
关键词
knowledge distillation; neural network compression; edge computing; image classification; self distillation; NEURAL-NETWORKS; FACE RECOGNITION;
D O I
10.3390/electronics13112062
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a specific form of knowledge distillation (KD), self-knowledge distillation enables a student network to progressively distill its own knowledge without relying on a pretrained, complex teacher network; however, recent studies of self-KD have discovered that additional dark knowledge captured by auxiliary architecture or data augmentation could create better soft targets for enhancing the network but at the cost of significantly more computations and/or parameters. Moreover, most existing self-KD methods extract the soft label as a supervisory signal from individual input samples, which overlooks the knowledge of relationships among categories. Inspired by human associative learning, we propose a simple yet effective self-KD method named associative learning for self-distillation (ALSD), which progressively distills richer knowledge regarding the relationships between categories across independent samples. Specifically, in the process of distillation, the propagation of knowledge is weighted based on the intersample relationship between associated samples generated in different minibatches, which are progressively estimated with the current network. In this way, our ALSD framework achieves knowledge ensembling progressively across multiple samples using a single network, resulting in minimal computational and memory overhead compared to existing ensembling methods. Extensive experiments demonstrate that our ALSD method consistently boosts the classification performance of various architectures on multiple datasets. Notably, ALSD pushes forward the self-KD performance to 80.10% on CIFAR-100, which exceeds the standard backpropagation by 4.81%. Furthermore, we observe that the proposed method shows comparable performance with the state-of-the-art knowledge distillation methods without the pretrained teacher network.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Self-Knowledge Distillation with Progressive Refinement of Targets
    Kim, Kyungyul
    Ji, ByeongMoon
    Yoon, Doyoung
    Hwang, Sangheum
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6547 - 6556
  • [2] Self-knowledge distillation via dropout
    Lee, Hyoje
    Park, Yeachan
    Seo, Hyun
    Kang, Myungjoo
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
  • [3] Neighbor self-knowledge distillation
    Liang, Peng
    Zhang, Weiwei
    Wang, Junhuang
    Guo, Yufeng
    INFORMATION SCIENCES, 2024, 654
  • [4] Personalized federated learning via decoupling self-knowledge distillation and global adaptive aggregation
    Tang, Zhiwei
    Xu, Shuwei
    Jin, Haozhe
    Liu, Shichong
    Zhai, Rui
    Lu, Ke
    MULTIMEDIA SYSTEMS, 2025, 31 (02)
  • [5] ROBUST AND ACCURATE OBJECT DETECTION VIA SELF-KNOWLEDGE DISTILLATION
    Xu, Weipeng
    Chu, Pengzhi
    Xie, Renhao
    Xiao, Xiongziyan
    Huang, Hongcheng
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 91 - 95
  • [6] SELF-KNOWLEDGE DISTILLATION VIA FEATURE ENHANCEMENT FOR SPEAKER VERIFICATION
    Liu, Bei
    Wang, Haoyu
    Chen, Zhengyang
    Wang, Shuai
    Qian, Yanmin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7542 - 7546
  • [7] Personalized Edge Intelligence via Federated Self-Knowledge Distillation
    Jin, Hai
    Bai, Dongshan
    Yao, Dezhong
    Dai, Yutong
    Gu, Lin
    Yu, Chen
    Sun, Lichao
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (02) : 567 - 580
  • [8] Automatic Diabetic Retinopathy Grading via Self-Knowledge Distillation
    Luo, Ling
    Xue, Dingyu
    Feng, Xinglong
    ELECTRONICS, 2020, 9 (09) : 1 - 13
  • [9] Self-knowledge distillation with dimensional history knowledge
    Wenke Huang
    Mang Ye
    Zekun Shi
    He Li
    Bo Du
    Science China Information Sciences, 2025, 68 (9)
  • [10] Dual teachers for self-knowledge distillation
    Li, Zheng
    Li, Xiang
    Yang, Lingfeng
    Song, Renjie
    Yang, Jian
    Pan, Zhigeng
    PATTERN RECOGNITION, 2024, 151