Multi-level Logit Distillation

被引：38

作者：

Jin, Ying ^{[1
]}

Wang, Jiaqi ^{[2
]}

Lin, Dahua ^{[1
,2
]}

机构：

[1] Chinese Univ Hong Kong, CUHK Sense Time Joint Lab, Hong Kong, Peoples R China

[2] Shanghai AI Lab, Shanghai, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.02325

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Knowledge Distillation (KD) aims at distilling the knowledge from the large teacher model to a lightweight student model. Mainstream KD methods can be divided into two categories, logit distillation, and feature distillation. The former is easy to implement, but inferior in performance, while the latter is not applicable to some practical circumstances due to concerns such as privacy and safety. Towards this dilemma, in this paper, we explore a stronger logit distillation method via making better utilization of logit outputs. Concretely, we propose a simple yet effective approach to logit distillation via multi-level prediction alignment. Through this framework, the prediction alignment is not only conducted at the instance level, but also at the batch and class level, through which the student model learns instance prediction, input correlation, and category correlation simultaneously. In addition, a prediction augmentation mechanism based on model calibration further boosts the performance. Extensive experiment results validate that our method enjoys consistently higher performance than previous logit distillation methods, and even reaches competitive performance with mainstream feature distillation methods. Code is available at https://github.com/Jin-Ying/Multi-Level-Logit-Distillation.

引用

页码：24276 / 24285

页数：10

共 50 条

[21] Joule heating membrane distillation enhancement with multi-level thermal concentration and heat recovery
Huang, Jian
He, Yurong
Shen, Zhiheng
ENERGY CONVERSION AND MANAGEMENT, 2021, 238
[22] Multi-level semantic enhancement based on self-distillation BERT for Chinese named
Li, Zepeng
Cao, Shuo
Zhai, Minyu
Ding, Nengneng
Zhang, Zhenwen
Hu, Bin
NEUROCOMPUTING, 2024, 586
[23] MKDAT: Multi-Level Knowledge Distillation with Adaptive Temperature for Distantly Supervised Relation Extraction
Long, Jun
Yin, Zhuoying
Han, Yan
Huang, Wenti
INFORMATION, 2024, 15 (07)
[24] Multi-level modelling via stochastic multi-level multiset rewriting
Oury, Nicolas
Plotkin, Gordon
MATHEMATICAL STRUCTURES IN COMPUTER SCIENCE, 2013, 23 (02) : 471 - 503
[25] Multi-Level Converters
Bose, Bimal K.
ELECTRONICS, 2015, 4 (03): : 582 - 585
[26] Multi-level Governance
Pearce, Graham
REGIONAL AND FEDERAL STUDIES, 2005, 15 (01): : 129 - 130
[27] Multi-level governance
Bovaird, T
PUBLIC ADMINISTRATION, 2005, 83 (04) : 966 - 969
[28] Multi-level analysis
Lydersen, Stian
TIDSSKRIFT FOR DEN NORSKE LAEGEFORENING, 2024, 144 (12)
[29] Multi-level systems
Lin, Yi, 1875, Taylor and Francis Ltd. (20):
[30] Multi-level visioning
Yearout, S
Miles, G
Koonce, RH
TRAINING & DEVELOPMENT, 2001, 55 (03): : 30 - +

← 1 2 3 4 5 →