Generous teacher: Good at distilling knowledge for student learning

被引：0

作者：

Ding, Yifeng ^{[1
]}

Yang, Gaoming ^{[1
]}

Yin, Shuting ^{[1
]}

Zhang, Ji ^{[2
]}

Fang, Xianjin ^{[1
]}

Yang, Wencheng ^{[2
]}

机构：

[1] Anhui Univ Sci & Technol, Sch Comp Sci & Engn, Huainan 232001, Peoples R China

[2] Univ Southern Queensland, Sch Math Phys & Comp, Toowoomba 4350, Australia

来源：

IMAGE AND VISION COMPUTING | 2024年 / 150卷

关键词：

Knowledge distillation; Generous teacher; Absorbing distilled knowledge; Decouple logit;

D O I：

10.1016/j.imavis.2024.105199

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Knowledge distillation is a technique that aims to transfer valuable knowledge from a large, well-trained model (the teacher) to a lightweight model (the student), with the primary goal of improving the student's performance on a given task. In recent years, mainstream distillation methods have focused on modifying student learning styles, resulting in less attention being paid to the knowledge provided by the teacher. However, upon reexamining the knowledge transferred by the teacher, we find that it still has untapped potential, which is crucial to bridging the performance gap between teachers and students. Therefore, we study knowledge distillation from the teacher's perspective and introduce a novel teacher knowledge enhancement method termed "Generous Teacher." The Generous Teacher is a specially trained teacher model that can provide more valuable knowledge for the student model. This is achieved by integrating a standardly trained teacher (Standard Teacher) to assist in the training process of the Generous Teacher. As a result, the Generous Teacher accomplishes the task at hand and assimilates distilled knowledge from the Standard Teacher, effectively adapting to distillation teaching in advance. Specifically, we recognize that non-target class knowledge plays a crucial role in improving the distillation effect for students. To leverage this, we decouple logit outputs and selectively use the Standard Teacher's non-target class knowledge to enhance the Generous Teacher. By setting the temperature as a multiple of the logit standard deviation, we ensure that the additional knowledge absorbed by the Generous Teacher is more suitable for student distillation. Experimental results on standard benchmarks demonstrate that the Generous Teacher surpasses the Standard Teacher in terms of accuracy when applied to standard knowledge distillation. Furthermore, the Generous Teacher can be seamlessly integrated into existing distillation methods, bringing general improvements at a low additional computational cost. The code will be publicly available at https://github.com/EifelTing/Generous-Teacher.

引用

页数：16

共 50 条

[31] Distilling Knowledge with a Teacher's Multitask Model for Biomedical Named Entity Recognition
Mehmood, Tahir
Gerevini, Alfonso E.
Lavelli, Alberto
Olivato, Matteo
Serina, Ivan
INFORMATION, 2023, 14 (05)
[32] Characteristics of a Good EFL Teacher: Omani EFL Teacher and Student Perspectives
Al-Mahrooqi, Rahma
Denman, Christopher
Al-Siyabi, Jamila
Al-Maamari, Faisal
SAGE OPEN, 2015, 5 (02):
[33] Teacher knowledge and student learning: An examination of teacher pedagogies for the same writing topic across two consecutive grades
Jessica Mantei
Lisa Kervin
The Australian Journal of Language and Literacy, 2020, 43 (3): : 224 - 234
[34] Teacher knowledge and student learning: An examination of teacher pedagogies for the same writing topic across two consecutive grades
Mantei, Jessica
Kervin, Lisa
AUSTRALIAN JOURNAL OF LANGUAGE AND LITERACY, 2020, 43 (03): : 224 - 234
[35] Causal Inference with Knowledge Distilling and Curriculum Learning for Unbiased VQA
Pan, Yonghua
Li, Zechao
Zhang, Liyan
Tang, Jinhui
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (03)
[36] Teaching Performance Evaluation Model: Preparation for Student Learning within the Framework for Teacher Good Performance
Galvez Suarez, Eric
Milla Toro, Ricardo
PROPOSITOS Y REPRESENTACIONES, 2018, 6 (02): : 431 - 452
[37] The nature and sharing of teacher knowledge of technology in a student teacher/mentor teacher pair
Margerum-Leys, J
Marx, RW
JOURNAL OF TEACHER EDUCATION, 2004, 55 (05) : 421 - 437
[38] Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Chen, Yanbei
Xian, Yongqin
Koepke, A. Sophia
Shan, Ying
Akata, Zeynep
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7012 - 7021
[39] Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks
Wang, Lin
Yoon, Kuk-Jin
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (06) : 3048 - 3068
[40] Single-Head Lifelong Learning Based on Distilling Knowledge
Wang, Yen-Hsiang
Lin, Chih-Yang
Thaipisutikul, Tipajin
Shih, Timothy K.
IEEE ACCESS, 2022, 10 : 35469 - 35478

← 1 2 3 4 5 →