Multi-teacher knowledge distillation based on joint Guidance of Probe and Adaptive Corrector

被引:7
|
作者
Shang, Ronghua [1 ]
Li, Wenzheng [2 ]
Zhu, Songling [1 ]
Jiao, Licheng [1 ]
Li, Yangyang [1 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian, Shaanxi, Peoples R China
[2] Xidian Univ, Guangzhou Inst Technol, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge distillation; Linear classifier probes; Convolutional neural networks; Spail attention; Model compression; NEURAL-NETWORKS; MODEL;
D O I
10.1016/j.neunet.2023.04.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) has been widely used in model compression. But, in the current multi -teacher KD algorithms, the student can only passively acquire the knowledge of the teacher's middle layer in a single form and all teachers use identical a guiding scheme to the student. To solve these problems, this paper proposes a multi-teacher KD based on joint Guidance of Probe and Adaptive Corrector (GPAC) method. First, GPAC proposes a teacher selection strategy guided by the Linear Classifier Probe (LCP). This strategy allows the student to select better teachers in the middle layer. Teachers are evaluated using the classification accuracy detected by LCP. Then, GPAC designs an adaptive multi-teacher instruction mechanism. The mechanism uses instructional weights to emphasize the student's predicted direction and reduce the student's difficulty learning from teachers. At the same time, every teacher can formulate guiding scheme according to the Kullback- Leibler divergence loss of the student and itself. Finally, GPAC develops a multi-level mechanism for adjusting spatial attention loss. this mechanism uses a piecewise function that varies with the number of epochs to adjust the spatial attention loss. This piecewise function classifies the student' learning about spatial attention into three levels, which can efficiently use spatial attention of teachers. GPAC and the current state-of-the-art distillation methods are tested on CIFAR-10 and CIFAR-100 datasets. The experimental results demonstrate that the proposed method in this paper can obtain higher classification accuracy. (c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页码:345 / 356
页数:12
相关论文
共 50 条
  • [21] Visual emotion analysis using skill-based multi-teacher knowledge distillation
    Cladiere, Tristan
    Alata, Olivier
    Ducottet, Christophe
    Konik, Hubert
    Legrand, Anne-Claire
    PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (02)
  • [22] mKDNAD: A network flow anomaly detection method based on multi-teacher knowledge distillation
    Yang, Yang
    Liu, Dan
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 314 - 319
  • [23] Named Entity Recognition Method Based on Multi-Teacher Collaborative Cyclical Knowledge Distillation
    Jin, Chunqiao
    Yang, Shuangyuan
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 230 - 235
  • [24] Enhancing BERT Performance: Multi-teacher Adversarial Distillation with Clean and Robust Guidance
    Wu, Xunjin
    Chang, Jingfei
    Cheng, Wen
    Wu, Yunxiang
    Li, Yong
    Zeng, Lingfang
    CONCEPTUAL MODELING, ER 2024, 2025, 15238 : 3 - 17
  • [25] MTKDSR: Multi-Teacher Knowledge Distillation for Super Resolution Image Reconstruction
    Yao, Gengqi
    Li, Zhan
    Bhanu, Bir
    Kang, Zhiqing
    Zhong, Ziyi
    Zhang, Qingfeng
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 352 - 358
  • [26] MulDE: Multi-teacher Knowledge Distillation for Low-dimensional Knowledge Graph Embeddings
    Wang, Kai
    Liu, Yu
    Ma, Qian
    Sheng, Quan Z.
    PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 1716 - 1726
  • [27] Continual Learning with Confidence-based Multi-teacher Knowledge Distillation for Neural Machine Translation
    Guo, Jiahua
    Liang, Yunlong
    Xu, Jinan
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 336 - 343
  • [28] CIMTD: Class Incremental Multi-Teacher Knowledge Distillation for Fractal Object Detection
    Wu, Chuhan
    Luo, Xiaochuan
    Huang, Haoran
    Zhang, Yulin
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XII, 2025, 15042 : 51 - 65
  • [29] A Multi-Teacher Assisted Knowledge Distillation Approach for Enhanced Face Image Authentication
    Cheng, Tiancong
    Zhang, Ying
    Yin, Yifang
    Zimmermann, Roger
    Yu, Zhiwen
    Guo, Bin
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 135 - 143
  • [30] MULTI-TEACHER DISTILLATION FOR INCREMENTAL OBJECT DETECTION
    Jiang, Le
    Cheng, Hongqiang
    Ye, Xiaozhou
    Ouyang, Ye
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5520 - 5524