Multi-teacher knowledge distillation based on joint Guidance of Probe and Adaptive Corrector

被引:7
|
作者
Shang, Ronghua [1 ]
Li, Wenzheng [2 ]
Zhu, Songling [1 ]
Jiao, Licheng [1 ]
Li, Yangyang [1 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian, Shaanxi, Peoples R China
[2] Xidian Univ, Guangzhou Inst Technol, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge distillation; Linear classifier probes; Convolutional neural networks; Spail attention; Model compression; NEURAL-NETWORKS; MODEL;
D O I
10.1016/j.neunet.2023.04.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) has been widely used in model compression. But, in the current multi -teacher KD algorithms, the student can only passively acquire the knowledge of the teacher's middle layer in a single form and all teachers use identical a guiding scheme to the student. To solve these problems, this paper proposes a multi-teacher KD based on joint Guidance of Probe and Adaptive Corrector (GPAC) method. First, GPAC proposes a teacher selection strategy guided by the Linear Classifier Probe (LCP). This strategy allows the student to select better teachers in the middle layer. Teachers are evaluated using the classification accuracy detected by LCP. Then, GPAC designs an adaptive multi-teacher instruction mechanism. The mechanism uses instructional weights to emphasize the student's predicted direction and reduce the student's difficulty learning from teachers. At the same time, every teacher can formulate guiding scheme according to the Kullback- Leibler divergence loss of the student and itself. Finally, GPAC develops a multi-level mechanism for adjusting spatial attention loss. this mechanism uses a piecewise function that varies with the number of epochs to adjust the spatial attention loss. This piecewise function classifies the student' learning about spatial attention into three levels, which can efficiently use spatial attention of teachers. GPAC and the current state-of-the-art distillation methods are tested on CIFAR-10 and CIFAR-100 datasets. The experimental results demonstrate that the proposed method in this paper can obtain higher classification accuracy. (c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页码:345 / 356
页数:12
相关论文
共 50 条
  • [31] MTMS: Multi-teacher Multi-stage Knowledge Distillation for Reasoning-Based Machine Reading Comprehension
    Zhao, Zhuo
    Xie, Zhiwen
    Zhou, Guangyou
    Huang, Jimmy Xiangji
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 1995 - 2005
  • [32] Building and road detection from remote sensing images based on weights adaptive multi-teacher collaborative distillation using a fused knowledge
    Chen, Ziyi
    Deng, Liai
    Gou, Jing
    Wang, Cheng
    Li, Jonathan
    Li, Dilong
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2023, 124
  • [33] Dissolved oxygen prediction in the Taiwan Strait with the attention-based multi-teacher knowledge distillation model
    Chen, Lei
    Lin, Ye
    Guo, Minquan
    Lu, Wenfang
    Li, Xueding
    Zhang, Zhenchang
    OCEAN & COASTAL MANAGEMENT, 2025, 265
  • [34] Semi-supervised lung adenocarcinoma histopathology image classification based on multi-teacher knowledge distillation
    Wang, Qixuan
    Zhang, Yanjun
    Lu, Jun
    Li, Congsheng
    Zhang, Yungang
    PHYSICS IN MEDICINE AND BIOLOGY, 2024, 69 (18):
  • [35] Adaptive weighted multi-teacher distillation for efficient medical imaging segmentation with limited data
    Ben Loussaief, Eddardaa
    Rashwan, Hatem A.
    Ayad, Mohammed
    Khalid, Adnan
    Puig, Domemec
    KNOWLEDGE-BASED SYSTEMS, 2025, 315
  • [36] Bi-Level Orthogonal Multi-Teacher Distillation
    Gong, Shuyue
    Wen, Weigang
    ELECTRONICS, 2024, 13 (16)
  • [37] MULTI-TEACHER KNOWLEDGE DISTILLATION FOR COMPRESSED VIDEO ACTION RECOGNITION ON DEEP NEURAL NETWORKS
    Wu, Meng-Chieh
    Chiu, Ching-Te
    Wu, Kun-Hsuan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2202 - 2206
  • [38] A Multi-teacher Knowledge Distillation Framework for Distantly Supervised Relation Extraction with Flexible Temperature
    Fei, Hongxiao
    Tan, Yangying
    Huang, Wenti
    Long, Jun
    Huang, Jincai
    Yang, Liu
    WEB AND BIG DATA, PT II, APWEB-WAIM 2023, 2024, 14332 : 103 - 116
  • [39] SA-MDRAD: sample-adaptive multi-teacher dynamic rectification adversarial distillation
    Li, Shuyi
    Yang, Xiaohan
    Cheng, Guozhen
    Liu, Wenyan
    Hu, Hongchao
    MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [40] UNIC: Universal Classification Models via Multi-teacher Distillation
    Sariyildiz, Mert Bulent
    Weinzaepfel, Philippe
    Lucas, Thomas
    Larlus, Diane
    Kalantidis, Yannis
    COMPUTER VISION-ECCV 2024, PT IV, 2025, 15062 : 353 - 371