Self-Knowledge Distillation via Progressive Associative Learning

被引:1
|
作者
Zhao, Haoran [1 ]
Bi, Yanxian [2 ]
Tian, Shuwen [1 ]
Wang, Jian [3 ]
Zhang, Peiying [4 ]
Deng, Zhaopeng [1 ]
Liu, Kai [5 ,6 ]
机构
[1] Qingdao Univ Technol, Sch Informat & Control Engn, Qingdao 266520, Peoples R China
[2] CETC Acad Elect & Informat Technol Grp Co Ltd, China Acad Elect & Informat Technol, Beijing 100041, Peoples R China
[3] China Univ Petr East China, Coll Sci, Qingdao 266580, Peoples R China
[4] China Univ Petr East China, Qingdao Inst Software, Coll Comp Sci & Technol, Qingdao 266580, Peoples R China
[5] Tsinghua Univ, State Key Lab Space Network & Commun, Beijing 100084, Peoples R China
[6] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100084, Peoples R China
关键词
knowledge distillation; neural network compression; edge computing; image classification; self distillation; NEURAL-NETWORKS; FACE RECOGNITION;
D O I
10.3390/electronics13112062
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a specific form of knowledge distillation (KD), self-knowledge distillation enables a student network to progressively distill its own knowledge without relying on a pretrained, complex teacher network; however, recent studies of self-KD have discovered that additional dark knowledge captured by auxiliary architecture or data augmentation could create better soft targets for enhancing the network but at the cost of significantly more computations and/or parameters. Moreover, most existing self-KD methods extract the soft label as a supervisory signal from individual input samples, which overlooks the knowledge of relationships among categories. Inspired by human associative learning, we propose a simple yet effective self-KD method named associative learning for self-distillation (ALSD), which progressively distills richer knowledge regarding the relationships between categories across independent samples. Specifically, in the process of distillation, the propagation of knowledge is weighted based on the intersample relationship between associated samples generated in different minibatches, which are progressively estimated with the current network. In this way, our ALSD framework achieves knowledge ensembling progressively across multiple samples using a single network, resulting in minimal computational and memory overhead compared to existing ensembling methods. Extensive experiments demonstrate that our ALSD method consistently boosts the classification performance of various architectures on multiple datasets. Notably, ALSD pushes forward the self-KD performance to 80.10% on CIFAR-100, which exceeds the standard backpropagation by 4.81%. Furthermore, we observe that the proposed method shows comparable performance with the state-of-the-art knowledge distillation methods without the pretrained teacher network.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Teaching Yourself: A Self-Knowledge Distillation Approach to Action Recognition
    Duc-Quang Vu
    Le, Ngan
    Wang, Jia-Ching
    IEEE ACCESS, 2021, 9 : 105711 - 105723
  • [32] Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation
    Yin, Zimo
    Pu, Jian
    Zhou, Yijie
    Xue, Xiangyang
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (11) : 2270 - 2283
  • [33] From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels
    Yang, Zhendong
    Zeng, Ailing
    Li, Zhe
    Zhang, Tianke
    Yuan, Chun
    Li, Yu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17139 - 17148
  • [34] Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation
    Zimo Yin
    Jian Pu
    Yijie Zhou
    Xiangyang Xue
    IEEE/CAA Journal of Automatica Sinica, 2024, 11 (11) : 2270 - 2283
  • [35] Self-knowledge distillation based on knowledge transfer from soft to hard examples
    Tang, Yuan
    Chen, Ying
    Xie, Linbo
    IMAGE AND VISION COMPUTING, 2023, 135
  • [36] The Self and Self-Knowledge
    Pasquali, Alessia
    Belleri, Delia
    PHILOSOPHICAL INQUIRIES, 2015, 3 (02): : R1 - R6
  • [37] Self-knowledge and the self
    Lagae, E
    TIJDSCHRIFT VOOR FILOSOFIE, 2001, 63 (03): : 623 - 625
  • [38] TOPOLOGY-REGULARIZED SELF-KNOWLEDGE DISTILLATION FOR TRANSDUCTIVE-INDUCTIVE LEARNING OF BRAIN DISORDER DIAGNOSIS
    Yang, Yanwu
    Guo, Xutao
    Cai, Guoqing
    Ye, Chenfei
    Ma, Ting
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 4045 - 4049
  • [39] The Self and Self-Knowledge
    Edwards, Sophie
    EUROPEAN JOURNAL OF PHILOSOPHY, 2013, 21 : e1 - e7
  • [40] Self-Knowledge
    Small, Will
    MIND, 2013, 122 (488) : 1091 - 1095