Self-Knowledge Distillation via Progressive Associative Learning

被引:1
|
作者
Zhao, Haoran [1 ]
Bi, Yanxian [2 ]
Tian, Shuwen [1 ]
Wang, Jian [3 ]
Zhang, Peiying [4 ]
Deng, Zhaopeng [1 ]
Liu, Kai [5 ,6 ]
机构
[1] Qingdao Univ Technol, Sch Informat & Control Engn, Qingdao 266520, Peoples R China
[2] CETC Acad Elect & Informat Technol Grp Co Ltd, China Acad Elect & Informat Technol, Beijing 100041, Peoples R China
[3] China Univ Petr East China, Coll Sci, Qingdao 266580, Peoples R China
[4] China Univ Petr East China, Qingdao Inst Software, Coll Comp Sci & Technol, Qingdao 266580, Peoples R China
[5] Tsinghua Univ, State Key Lab Space Network & Commun, Beijing 100084, Peoples R China
[6] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100084, Peoples R China
关键词
knowledge distillation; neural network compression; edge computing; image classification; self distillation; NEURAL-NETWORKS; FACE RECOGNITION;
D O I
10.3390/electronics13112062
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a specific form of knowledge distillation (KD), self-knowledge distillation enables a student network to progressively distill its own knowledge without relying on a pretrained, complex teacher network; however, recent studies of self-KD have discovered that additional dark knowledge captured by auxiliary architecture or data augmentation could create better soft targets for enhancing the network but at the cost of significantly more computations and/or parameters. Moreover, most existing self-KD methods extract the soft label as a supervisory signal from individual input samples, which overlooks the knowledge of relationships among categories. Inspired by human associative learning, we propose a simple yet effective self-KD method named associative learning for self-distillation (ALSD), which progressively distills richer knowledge regarding the relationships between categories across independent samples. Specifically, in the process of distillation, the propagation of knowledge is weighted based on the intersample relationship between associated samples generated in different minibatches, which are progressively estimated with the current network. In this way, our ALSD framework achieves knowledge ensembling progressively across multiple samples using a single network, resulting in minimal computational and memory overhead compared to existing ensembling methods. Extensive experiments demonstrate that our ALSD method consistently boosts the classification performance of various architectures on multiple datasets. Notably, ALSD pushes forward the self-KD performance to 80.10% on CIFAR-100, which exceeds the standard backpropagation by 4.81%. Furthermore, we observe that the proposed method shows comparable performance with the state-of-the-art knowledge distillation methods without the pretrained teacher network.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Self-Knowledge
    Lalumera, Elisabetta
    ANALYSIS, 2012, 72 (03) : 619 - 620
  • [42] Self-Knowledge as Knowledge?
    Vega-Encabo, Jesus
    TEOREMA, 2011, 30 (03): : 35 - 49
  • [43] Jointly-Learnt Networks for Future Action Anticipation via Self-Knowledge Distillation and Cycle Consistency
    Moniruzzaman, Md.
    Yin, Zhaozheng
    He, Zhihai
    Leu, Ming C.
    Qin, Ruwen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) : 3243 - 3256
  • [44] 'SELF-KNOWLEDGE'
    MATTHEWS, W
    PLOUGHSHARES, 1986, 12 (03) : 34 - 34
  • [45] SELF-KNOWLEDGE
    GARCIABO.T
    MARROW, AJ
    SCIENCE AND TECHNOLOGY, 1968, (81): : 68 - &
  • [46] Self-knowledge and the self
    Larmore, Charles
    EUROPEAN JOURNAL OF PHILOSOPHY, 2022, 30 (04) : 1233 - 1247
  • [47] Self-Knowledge Through Numbers and the Operationalization of Learning
    Schneider, Hanna
    UBICOMP'16 ADJUNCT: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING, 2016, : 189 - 192
  • [48] Self-knowledge
    Cholbi, Michael
    TPM-THE PHILOSOPHERS MAGAZINE, 2016, (72): : 35 - +
  • [49] 'SELF-KNOWLEDGE'
    EWART, G
    TLS-THE TIMES LITERARY SUPPLEMENT, 1980, (4014): : 223 - 223
  • [50] 'SELF-KNOWLEDGE'
    WILLIAMS, CK
    ANTAEUS, 1985, (55): : 190 - 190