UNIC: Universal Classification Models via Multi-teacher Distillation

被引:0
|
作者
Sariyildiz, Mert Bulent [1 ]
Weinzaepfel, Philippe [1 ]
Lucas, Thomas [1 ]
Larlus, Diane [1 ]
Kalantidis, Yannis [1 ]
机构
[1] NAVER LABS Europe, Meylan, France
来源
关键词
Multi-Teacher Distillation; Classification; Generalization; KNOWLEDGE DISTILLATION; ENSEMBLE;
D O I
10.1007/978-3-031-73235-5_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pretrained models have become a commodity and offer strong results on a broad range of tasks. In this work, we focus on classification and seek to learn a unique encoder able to take from several complementary pretrained models. We aim at even stronger generalization across a variety of classification tasks. We propose to learn such an encoder via multi-teacher distillation. We first thoroughly analyze standard distillation when driven by multiple strong teachers with complementary strengths. Guided by this analysis, we gradually propose improvements to the basic distillation setup. Among those, we enrich the architecture of the encoder with a ladder of expendable projectors, which increases the impact of intermediate features during distillation, and we introduce teacher dropping, a regularization mechanism that better balances the teachers' influence. Our final distillation strategy leads to student models of the same capacity as any of the teachers, while retaining or improving upon the performance of the best teacher for each task.
引用
收藏
页码:353 / 371
页数:19
相关论文
共 50 条
  • [1] Knowledge Distillation via Multi-Teacher Feature Ensemble
    Ye, Xin
    Jiang, Rongxin
    Tian, Xiang
    Zhang, Rui
    Chen, Yaowu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 566 - 570
  • [2] Knowledge Distillation via Multi-Teacher Feature Ensemble
    Ye, Xin
    Jiang, Rongxin
    Tian, Xiang
    Zhang, Rui
    Chen, Yaowu
    IEEE Signal Processing Letters, 2024, 31 : 566 - 570
  • [3] Enhanced Accuracy and Robustness via Multi-teacher Adversarial Distillation
    Zhao, Shiji
    Yu, Jie
    Sun, Zhenlong
    Zhang, Bo
    Wei, Xingxing
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 585 - 602
  • [4] Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation
    Cao, Shengcao
    Li, Mengtian
    Hays, James
    Ramanan, Deva
    Wang, Yu-Xiong
    Gui, Liang-Yan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [5] Correlation Guided Multi-teacher Knowledge Distillation
    Shi, Luyao
    Jiang, Ning
    Tang, Jialiang
    Huang, Xinlei
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 562 - 574
  • [6] Reinforced Multi-Teacher Selection for Knowledge Distillation
    Yuan, Fei
    Shou, Linjun
    Pei, Jian
    Lin, Wutao
    Gong, Ming
    Fu, Yan
    Jiang, Daxin
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14284 - 14291
  • [7] MULTI-TEACHER DISTILLATION FOR INCREMENTAL OBJECT DETECTION
    Jiang, Le
    Cheng, Hongqiang
    Ye, Xiaozhou
    Ouyang, Ye
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5520 - 5524
  • [8] Multi-teacher Universal Distillation Based on Information Hiding for Defense Against Facial Manipulation
    Li, Xin
    Ni, Rongrong
    Zhao, Yao
    Ni, Yu
    Li, Haoliang
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5293 - 5307
  • [9] Adaptive multi-teacher multi-level knowledge distillation
    Liu, Yuang
    Zhang, Wei
    Wang, Jun
    Neurocomputing, 2021, 415 : 106 - 113
  • [10] Adaptive multi-teacher multi-level knowledge distillation
    Liu, Yuang
    Zhang, Wei
    Wang, Jun
    NEUROCOMPUTING, 2020, 415 : 106 - 113