UNIC: Universal Classification Models via Multi-teacher Distillation

被引：0

作者：

Sariyildiz, Mert Bulent ^{[1
]}

Weinzaepfel, Philippe ^{[1
]}

Lucas, Thomas ^{[1
]}

Larlus, Diane ^{[1
]}

Kalantidis, Yannis ^{[1
]}

机构：

[1] NAVER LABS Europe, Meylan, France

来源：

COMPUTER VISION-ECCV 2024, PT IV | 2025年 / 15062卷

关键词：

Multi-Teacher Distillation; Classification; Generalization; KNOWLEDGE DISTILLATION; ENSEMBLE;

D O I：

10.1007/978-3-031-73235-5_20

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pretrained models have become a commodity and offer strong results on a broad range of tasks. In this work, we focus on classification and seek to learn a unique encoder able to take from several complementary pretrained models. We aim at even stronger generalization across a variety of classification tasks. We propose to learn such an encoder via multi-teacher distillation. We first thoroughly analyze standard distillation when driven by multiple strong teachers with complementary strengths. Guided by this analysis, we gradually propose improvements to the basic distillation setup. Among those, we enrich the architecture of the encoder with a ladder of expendable projectors, which increases the impact of intermediate features during distillation, and we introduce teacher dropping, a regularization mechanism that better balances the teachers' influence. Our final distillation strategy leads to student models of the same capacity as any of the teachers, while retaining or improving upon the performance of the best teacher for each task.

引用

页码：353 / 371

页数：19

共 50 条

[1] Knowledge Distillation via Multi-Teacher Feature Ensemble
Ye, Xin
Jiang, Rongxin
Tian, Xiang
Zhang, Rui
Chen, Yaowu
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 566 - 570
[2] Knowledge Distillation via Multi-Teacher Feature Ensemble
Ye, Xin
Jiang, Rongxin
Tian, Xiang
Zhang, Rui
Chen, Yaowu
IEEE Signal Processing Letters, 2024, 31 : 566 - 570
[3] Enhanced Accuracy and Robustness via Multi-teacher Adversarial Distillation
Zhao, Shiji
Yu, Jie
Sun, Zhenlong
Zhang, Bo
Wei, Xingxing
COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 585 - 602
[4] Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation
Cao, Shengcao
Li, Mengtian
Hays, James
Ramanan, Deva
Wang, Yu-Xiong
Gui, Liang-Yan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[5] Correlation Guided Multi-teacher Knowledge Distillation
Shi, Luyao
Jiang, Ning
Tang, Jialiang
Huang, Xinlei
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 562 - 574
[6] Reinforced Multi-Teacher Selection for Knowledge Distillation
Yuan, Fei
Shou, Linjun
Pei, Jian
Lin, Wutao
Gong, Ming
Fu, Yan
Jiang, Daxin
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14284 - 14291
[7] MULTI-TEACHER DISTILLATION FOR INCREMENTAL OBJECT DETECTION
Jiang, Le
Cheng, Hongqiang
Ye, Xiaozhou
Ouyang, Ye
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5520 - 5524
[8] Multi-teacher Universal Distillation Based on Information Hiding for Defense Against Facial Manipulation
Li, Xin
Ni, Rongrong
Zhao, Yao
Ni, Yu
Li, Haoliang
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5293 - 5307
[9] Adaptive multi-teacher multi-level knowledge distillation
Liu, Yuang
Zhang, Wei
Wang, Jun
Neurocomputing, 2021, 415 : 106 - 113
[10] Adaptive multi-teacher multi-level knowledge distillation
Liu, Yuang
Zhang, Wei
Wang, Jun
NEUROCOMPUTING, 2020, 415 : 106 - 113

← 1 2 3 4 5 →