Inconsistency, Instability, and Generalization Gap of Deep Neural Network Training

被引:0
|
作者
Johnson, Rie [1 ]
Zhang, Tong [2 ,3 ]
机构
[1] RJ Res Consulting, New York, NY 11215 USA
[2] HKUST, Hong Kong, Peoples R China
[3] Google Res, Mountain View, CA USA
关键词
STABILITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As deep neural networks are highly expressive, it is important to find solutions with small generalization gap (the difference between the performance on the training data and unseen data). Focusing on the stochastic nature of training, we first present a theoretical analysis in which the bound of generalization gap depends on what we call inconsistency and instability of model outputs, which can be estimated on unlabeled data. Our empirical study based on this analysis shows that instability and inconsistency are strongly predictive of generalization gap in various settings. In particular, our finding indicates that inconsistency is a more reliable indicator of generalization gap than the sharpness of the loss landscape. Furthermore, we show that algorithmic reduction of inconsistency leads to superior performance. The results also provide a theoretical basis for existing methods such as co-distillation and ensemble.
引用
收藏
页数:27
相关论文
共 50 条
  • [31] Training Behavior of Deep Neural Network in Frequency Domain
    Xu, Zhi-Qin John
    Zhang, Yaoyu
    Xiao, Yanyang
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT I, 2019, 11953 : 264 - 274
  • [32] Development of an Optimised Dataset for Training a Deep Neural Network
    Newman, Callum
    Petzing, Jon
    Goh, Yee Mey
    Justham, Laura
    ADVANCES IN MANUFACTURING TECHNOLOGY XXXIV, 2021, 15 : 15 - 20
  • [33] Vector Analysis of Deep Neural Network Training Process
    Podoprosvetov, Alexey
    Smolin, Vladimir
    Sokolov, Sergey
    DEEP LEARNING THEORY AND APPLICATIONS, PT I, DELTA 2024, 2024, 2171 : 219 - 237
  • [34] Deep Neural Network Training Emphasizing Central Frames
    Kurata, Gakuto
    Willett, Daniel
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3595 - 3599
  • [35] Accelerating Data Loading in Deep Neural Network Training
    Yang, Chih-Chieh
    Cong, Guojing
    2019 IEEE 26TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC), 2019, : 235 - 245
  • [36] Cooperative Initialization based Deep Neural Network Training
    Singh, Pravendra
    Varshney, Munender
    Namboodiri, Vinay P.
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1130 - 1139
  • [37] Training and generalization of experimental values of ice scour event by a neural-network
    Kioka, S
    Kubouchi, A
    Saeki, H
    PROCEEDINGS OF THE THIRTEENTH (2003) INTERNATIONAL OFFSHORE AND POLAR ENGINEERING CONFERENCE, VOL 1, 2003, : 539 - 544
  • [38] Neural Spatial Interaction Models: Network Training, Model Complexity and Generalization Performance
    Fischer, Manfred M.
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT IV, 2013, 7974 : 1 - 16
  • [39] Deep neural network for water/fat separation: Supervised training, unsupervised training, and no training
    Jafari, Ramin
    Spincemaille, Pascal
    Zhang, Jinwei
    Nguyen, Thanh D.
    Luo, Xianfu
    Cho, Junghun
    Margolis, Daniel
    Prince, Martin R.
    Wang, Yi
    MAGNETIC RESONANCE IN MEDICINE, 2021, 85 (04) : 2263 - 2277
  • [40] GENERALIZATION IN AN ANALOG NEURAL NETWORK
    STARIOLO, DA
    TAMARIT, FA
    PHYSICAL REVIEW A, 1992, 46 (08): : 5249 - 5252