Self-Knowledge Distillation via Progressive Associative Learning

被引:1
|
作者
Zhao, Haoran [1 ]
Bi, Yanxian [2 ]
Tian, Shuwen [1 ]
Wang, Jian [3 ]
Zhang, Peiying [4 ]
Deng, Zhaopeng [1 ]
Liu, Kai [5 ,6 ]
机构
[1] Qingdao Univ Technol, Sch Informat & Control Engn, Qingdao 266520, Peoples R China
[2] CETC Acad Elect & Informat Technol Grp Co Ltd, China Acad Elect & Informat Technol, Beijing 100041, Peoples R China
[3] China Univ Petr East China, Coll Sci, Qingdao 266580, Peoples R China
[4] China Univ Petr East China, Qingdao Inst Software, Coll Comp Sci & Technol, Qingdao 266580, Peoples R China
[5] Tsinghua Univ, State Key Lab Space Network & Commun, Beijing 100084, Peoples R China
[6] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100084, Peoples R China
关键词
knowledge distillation; neural network compression; edge computing; image classification; self distillation; NEURAL-NETWORKS; FACE RECOGNITION;
D O I
10.3390/electronics13112062
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a specific form of knowledge distillation (KD), self-knowledge distillation enables a student network to progressively distill its own knowledge without relying on a pretrained, complex teacher network; however, recent studies of self-KD have discovered that additional dark knowledge captured by auxiliary architecture or data augmentation could create better soft targets for enhancing the network but at the cost of significantly more computations and/or parameters. Moreover, most existing self-KD methods extract the soft label as a supervisory signal from individual input samples, which overlooks the knowledge of relationships among categories. Inspired by human associative learning, we propose a simple yet effective self-KD method named associative learning for self-distillation (ALSD), which progressively distills richer knowledge regarding the relationships between categories across independent samples. Specifically, in the process of distillation, the propagation of knowledge is weighted based on the intersample relationship between associated samples generated in different minibatches, which are progressively estimated with the current network. In this way, our ALSD framework achieves knowledge ensembling progressively across multiple samples using a single network, resulting in minimal computational and memory overhead compared to existing ensembling methods. Extensive experiments demonstrate that our ALSD method consistently boosts the classification performance of various architectures on multiple datasets. Notably, ALSD pushes forward the self-KD performance to 80.10% on CIFAR-100, which exceeds the standard backpropagation by 4.81%. Furthermore, we observe that the proposed method shows comparable performance with the state-of-the-art knowledge distillation methods without the pretrained teacher network.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] OPTIMIZING MUSIC SOURCE SEPARATION IN COMPLEX AUDIO ENVIRONMENTS THROUGH PROGRESSIVE SELF-KNOWLEDGE DISTILLATION
    Han, ChangHeon
    Lee, SuHyun
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 13 - 14
  • [22] Self-knowledge distillation based on dynamic mixed attention
    Tang, Yuan
    Chen, Ying
    Kongzhi yu Juece/Control and Decision, 2024, 39 (12): : 4099 - 4108
  • [23] Enhancing deep feature representation in self-knowledge distillation via pyramid feature refinement
    Yu, Hao
    Feng, Xin
    Wang, Yunlong
    PATTERN RECOGNITION LETTERS, 2024, 178 : 35 - 42
  • [24] MixSKD: Self-Knowledge Distillation from Mixup for Image Recognition
    Yang, Chuanguang
    An, Zhulin
    Zhou, Helong
    Cai, Linhang
    Zhi, Xiang
    Wu, Jiwen
    Xu, Yongjun
    Zhang, Qian
    COMPUTER VISION, ECCV 2022, PT XXIV, 2022, 13684 : 534 - 551
  • [25] Adaptive lightweight network construction method for Self-Knowledge Distillation
    Lu, Siyuan
    Zeng, Weiliang
    Li, Xueshi
    Ou, Jiajun
    NEUROCOMPUTING, 2025, 624
  • [26] A Novel Self-Knowledge Distillation Approach w h Siamese Representation Learning for Action Recognition
    Vu, Duc-Quang
    Thi-Thu-Trang Phung
    Wang, Jia-Ching
    2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [27] TASKED: Transformer-based Adversarial learning for human activity recognition using wearable sensors via Self-KnowledgE Distillation
    Suh, Sungho
    Rey, Vitor Fortes
    Lukowicz, Paul
    KNOWLEDGE-BASED SYSTEMS, 2023, 260
  • [28] SICNet: Learning selective inter-slice context via Mask-Guided Self-knowledge distillation for NPC segmentation
    Zhang, Jinhong
    Li, Bin
    Qiu, Qianhui
    Mo, Hongqiang
    Tian, Lianfang
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
  • [29] Self-Knowledge Distillation for First Trimester Ultrasound Saliency Prediction
    Gridach, Mourad
    Savochkina, Elizaveta
    Drukker, Lior
    Papageorghiou, Aris T.
    Noble, J. Alison
    SIMPLIFYING MEDICAL ULTRASOUND, ASMUS 2022, 2022, 13565 : 117 - 127
  • [30] Decoupled Feature and Self-Knowledge Distillation for Speech Emotion Recognition
    Yu, Haixiang
    Ning, Yuan
    IEEE ACCESS, 2025, 13 : 33275 - 33285