Self-Distillation Amplifies Regularization in Hilbert Space

被引:0
|
作者
Mobahi, Hossein [1 ]
Farajtabar, Mehrdad [2 ]
Bartlett, Peter L. [1 ,3 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] DeepMind, Mountain View, CA USA
[3] Univ Calif Berkeley, Dept EECS, Berkeley, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another. In particular, when the architectures are identical, this is called self-distillation. The idea is to feed in predictions of the trained model as new target values for retraining (and iterate this loop possibly a few times). It has been empirically observed that the self-distilled model often achieves higher accuracy on held out data. Why this happens, however, has been a mystery: the self-distillation dynamics does not receive any new information about the task and solely evolves by looping over training. To the best of our knowledge, there is no rigorous understanding of why this happens. This work provides the first theoretical analysis of self-distillation. We focus on fitting a nonlinear function to training data, where the model space is Hilbert space and fitting is subject to ?(2) regularization in this function space. We show that self-distillation iterations modify regularization by progressively limiting the number of basis functions that can be used to represent the solution. This implies (as we also verify empirically) that while a few rounds of self-distillation may reduce over-fitting, further rounds may lead to under-fitting and thus worse performance.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] EasySED: Trusted Sound Event Detection with Self-Distillation
    Zhou, Qingsong
    Xu, Kele
    Feng, Ming
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 13236 - 13238
  • [22] Toward Generalized Multistage Clustering: Multiview Self-Distillation
    Wang, Jiatai
    Xu, Zhiwei
    Wang, Xin
    Li, Tao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [23] Variational Self-Distillation for Remote Sensing Scene Classification
    Hu, Yutao
    Huang, Xin
    Luo, Xiaoyan
    Han, Jungong
    Cao, Xianbin
    Zhang, Jun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [24] Multi-exit self-distillation with appropriate teachers
    Sun, Wujie
    Chen, Defang
    Wang, Can
    Ye, Deshi
    Feng, Yan
    Chen, Chun
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2024, 25 (04) : 585 - 599
  • [25] Self-Distillation as Instance-Specific Label Smoothing
    Zhang, Zhilu
    Sabuncu, Mert R.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [26] SIMPLE SELF-DISTILLATION LEARNING FOR NOISY IMAGE CLASSIFICATION
    Sasaya, Tenta
    Watanabe, Takashi
    Ida, Takashi
    Ono, Toshiyuki
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 795 - 799
  • [27] Self-Distillation via Intra-Class Compactness
    Lin, Jiaye
    Li, Lin
    Yu, Baosheng
    Ou, Weihua
    Gou, Jianping
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT 1, 2025, 15031 : 139 - 151
  • [28] A Self-distillation Lightweight Image Classification Network Scheme
    Ni S.
    Ma X.
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2023, 46 (06): : 66 - 71
  • [29] A dynamic dropout self-distillation method for object segmentation
    Chen, Lei
    Cao, Tieyong
    Zheng, Yunfei
    Wang, Yang
    Zhang, Bo
    Yang, Jibin
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)
  • [30] Self-Distillation: Towards Efficient and Compact Neural Networks
    Zhang, Linfeng
    Bao, Chenglong
    Ma, Kaisheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (08) : 4388 - 4403