Self-Distillation Amplifies Regularization in Hilbert Space

被引:0
|
作者
Mobahi, Hossein [1 ]
Farajtabar, Mehrdad [2 ]
Bartlett, Peter L. [1 ,3 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] DeepMind, Mountain View, CA USA
[3] Univ Calif Berkeley, Dept EECS, Berkeley, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another. In particular, when the architectures are identical, this is called self-distillation. The idea is to feed in predictions of the trained model as new target values for retraining (and iterate this loop possibly a few times). It has been empirically observed that the self-distilled model often achieves higher accuracy on held out data. Why this happens, however, has been a mystery: the self-distillation dynamics does not receive any new information about the task and solely evolves by looping over training. To the best of our knowledge, there is no rigorous understanding of why this happens. This work provides the first theoretical analysis of self-distillation. We focus on fitting a nonlinear function to training data, where the model space is Hilbert space and fitting is subject to ?(2) regularization in this function space. We show that self-distillation iterations modify regularization by progressively limiting the number of basis functions that can be used to represent the solution. This implies (as we also verify empirically) that while a few rounds of self-distillation may reduce over-fitting, further rounds may lead to under-fitting and thus worse performance.
引用
收藏
页数:11
相关论文
共 50 条
  • [11] Self-distillation for Surgical Action Recognition
    Yamlahi, Amine
    Thuy Nuong Tran
    Godau, Patrick
    Schellenberg, Melanie
    Michael, Dominik
    Smidt, Finn-Henri
    Noelke, Jan-Hinrich
    Adler, Tim J.
    Tizabi, Minu Dietlinde
    Nwoye, Chinedu Innocent
    Padoy, Nicolas
    Maier-Hein, Lena
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IX, 2023, 14228 : 637 - 646
  • [12] Future Augmentation with Self-distillation in Recommendation
    Liu, Chong
    Xie, Ruobing
    Liu, Xiaoyang
    Wang, Pinzheng
    Zheng, Rongqin
    Zhang, Lixin
    Li, Juntao
    Xia, Feng
    Lin, Leyu
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2023, PT VI, 2023, 14174 : 602 - 618
  • [13] Image classification based on self-distillation
    Yuting Li
    Linbo Qing
    Xiaohai He
    Honggang Chen
    Qiang Liu
    Applied Intelligence, 2023, 53 : 9396 - 9408
  • [14] Self-Distillation for Randomized Neural Networks
    Hu, Minghui
    Gao, Ruobin
    Suganthan, Ponnuthurai Nagaratnam
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (11) : 1 - 10
  • [15] Image classification based on self-distillation
    Li, Yuting
    Qing, Linbo
    He, Xiaohai
    Chen, Honggang
    Liu, Qiang
    APPLIED INTELLIGENCE, 2023, 53 (08) : 9396 - 9408
  • [16] Understanding Self-Distillation in the Presence of Label Noise
    Das, Rudrajit
    Sanghavi, Sujay
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [17] Deep Contrastive Representation Learning With Self-Distillation
    Xiao, Zhiwen
    Xing, Huanlai
    Zhao, Bowen
    Qu, Rong
    Luo, Shouxi
    Dai, Penglin
    Li, Ke
    Zhu, Zonghai
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (01): : 3 - 15
  • [18] Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation
    Borup, Kenneth
    Andersen, Lars N.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [19] Self-distillation and self-supervision for partial label learning
    Yu, Xiaotong
    Sun, Shiding
    Tian, Yingjie
    PATTERN RECOGNITION, 2024, 146
  • [20] Enhancing Tiny Tissues Segmentation via Self-Distillation
    Zhou, Chuan
    Chen, Yuchu
    Fan, Minghao
    Wen, Yang
    Chen, Hang
    Chen, Leiting
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 934 - 940