Self-knowledge distillation based on knowledge transfer from soft to hard examples
被引:4
|
作者:
Tang, Yuan
论文数: 0引用数: 0
h-index: 0
机构:
Jiangnan Univ, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi 214122, Peoples R ChinaJiangnan Univ, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi 214122, Peoples R China
Tang, Yuan
[1
]
Chen, Ying
论文数: 0引用数: 0
h-index: 0
机构:
Jiangnan Univ, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi 214122, Peoples R ChinaJiangnan Univ, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi 214122, Peoples R China
Chen, Ying
[1
]
Xie, Linbo
论文数: 0引用数: 0
h-index: 0
机构:
Jiangnan Univ, Minist Educ, Engn Res Ctr Internet Things Technol Applicat, Wuxi 214122, Peoples R ChinaJiangnan Univ, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi 214122, Peoples R China
Xie, Linbo
[2
]
机构:
[1] Jiangnan Univ, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi 214122, Peoples R China
[2] Jiangnan Univ, Minist Educ, Engn Res Ctr Internet Things Technol Applicat, Wuxi 214122, Peoples R China
Model compression;
Self-knowledge distillation;
Hard examples;
Class probability consistency;
Memory bank;
D O I:
10.1016/j.imavis.2023.104700
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
To fully exploit knowledge from self-knowledge distillation network in which a student model is progressively trained to distill its own knowledge without a pre-trained teacher model, a self-knowledge distillation method based on knowledge transfer from soft to hard examples is proposed. A knowledge transfer module is designed to exploit the dark knowledge of hard examples, which can force the class probability consistency between hard and soft examples. It reduces the confidence of wrong prediction by transferring the class information from soft probability distributions of auxiliary self-teacher network to classifier network (self-student network). Further-more, a dynamic memory bank for softened probability distribution is introduced, whose updating strategy is also presented. Experiments show the method improves the accuracy by 0.64% on classification datasets in aver-age and by 3.87% on fine-grained visual recognition tasks in average, which makes its performance superior to the state-of-the-arts.(c) 2023 Elsevier B.V. All rights reserved.