Self-Distillation Amplifies Regularization in Hilbert Space

被引:0
|
作者
Mobahi, Hossein [1 ]
Farajtabar, Mehrdad [2 ]
Bartlett, Peter L. [1 ,3 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] DeepMind, Mountain View, CA USA
[3] Univ Calif Berkeley, Dept EECS, Berkeley, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another. In particular, when the architectures are identical, this is called self-distillation. The idea is to feed in predictions of the trained model as new target values for retraining (and iterate this loop possibly a few times). It has been empirically observed that the self-distilled model often achieves higher accuracy on held out data. Why this happens, however, has been a mystery: the self-distillation dynamics does not receive any new information about the task and solely evolves by looping over training. To the best of our knowledge, there is no rigorous understanding of why this happens. This work provides the first theoretical analysis of self-distillation. We focus on fitting a nonlinear function to training data, where the model space is Hilbert space and fitting is subject to ?(2) regularization in this function space. We show that self-distillation iterations modify regularization by progressively limiting the number of basis functions that can be used to represent the solution. This implies (as we also verify empirically) that while a few rounds of self-distillation may reduce over-fitting, further rounds may lead to under-fitting and thus worse performance.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] A Teacher-Free Graph Knowledge Distillation Framework With Dual Self-Distillation
    Wu, Lirong
    Lin, Haitao
    Gao, Zhangyang
    Zhao, Guojiang
    Li, Stan Z.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (09) : 4375 - 4385
  • [32] Few-shot Learning with Online Self-Distillation
    Liu, Sihan
    Wang, Yue
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 1067 - 1070
  • [33] Self-Distillation for Few-Shot Image Captioning
    Chen, Xianyu
    Jiang, Ming
    Zhao, Qi
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 545 - 555
  • [34] SILC: Improving Vision Language Pretraining with Self-distillation
    Naeem, Muhammad Ferjad
    Xiang, Yongqin
    Zhai, Xiaohua
    Hoyer, Lukas
    Van Gool, Luc
    Tombari, Federico
    COMPUTER VISION - ECCV 2024, PT XXI, 2025, 15079 : 38 - 55
  • [35] Efficient Semantic Segmentation via Self-Attention and Self-Distillation
    An, Shumin
    Liao, Qingmin
    Lu, Zongqing
    Xue, Jing-Hao
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (09) : 15256 - 15266
  • [36] Balanced self-distillation for long-tailed recognition
    Ren, Ning
    Li, Xiaosong
    Wu, Yanxia
    Fu, Yan
    KNOWLEDGE-BASED SYSTEMS, 2024, 290
  • [37] Improving Differentiable Architecture Search via self-distillation
    Zhu, Xunyu
    Li, Jian
    Liu, Yong
    Wang, Weiping
    NEURAL NETWORKS, 2023, 167 : 656 - 667
  • [38] Monocular Depth Estimation via Self-Supervised Self-Distillation
    Hu, Haifeng
    Feng, Yuyang
    Li, Dapeng
    Zhang, Suofei
    Zhao, Haitao
    SENSORS, 2024, 24 (13)
  • [39] Self-supervised Anomaly Detection by Self-distillation and Negative Sampling
    Rafiee, Nima
    Gholamipoor, Rahil
    Adaloglou, Nikolas
    Jaxy, Simon
    Ramakers, Julius
    Kollmann, Markus
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 459 - 470
  • [40] Transferable adversarial masked self-distillation for unsupervised domain adaptation
    Xia, Yuelong
    Yun, Li-Jun
    Yang, Chengfu
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (06) : 6567 - 6580