Inconsistency, Instability, and Generalization Gap of Deep Neural Network Training

被引：0

作者：

Johnson, Rie ^{[1
]}

Zhang, Tong ^{[2
,3
]}

机构：

[1] RJ Res Consulting, New York, NY 11215 USA

[2] HKUST, Hong Kong, Peoples R China

[3] Google Res, Mountain View, CA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

STABILITY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As deep neural networks are highly expressive, it is important to find solutions with small generalization gap (the difference between the performance on the training data and unseen data). Focusing on the stochastic nature of training, we first present a theoretical analysis in which the bound of generalization gap depends on what we call inconsistency and instability of model outputs, which can be estimated on unlabeled data. Our empirical study based on this analysis shows that instability and inconsistency are strongly predictive of generalization gap in various settings. In particular, our finding indicates that inconsistency is a more reliable indicator of generalization gap than the sharpness of the loss landscape. Furthermore, we show that algorithmic reduction of inconsistency leads to superior performance. The results also provide a theoretical basis for existing methods such as co-distillation and ensemble.

引用

页数：27

共 50 条

[31] Training Behavior of Deep Neural Network in Frequency Domain
Xu, Zhi-Qin John
Zhang, Yaoyu
Xiao, Yanyang
NEURAL INFORMATION PROCESSING (ICONIP 2019), PT I, 2019, 11953 : 264 - 274
[32] Development of an Optimised Dataset for Training a Deep Neural Network
Newman, Callum
Petzing, Jon
Goh, Yee Mey
Justham, Laura
ADVANCES IN MANUFACTURING TECHNOLOGY XXXIV, 2021, 15 : 15 - 20
[33] Vector Analysis of Deep Neural Network Training Process
Podoprosvetov, Alexey
Smolin, Vladimir
Sokolov, Sergey
DEEP LEARNING THEORY AND APPLICATIONS, PT I, DELTA 2024, 2024, 2171 : 219 - 237
[34] Deep Neural Network Training Emphasizing Central Frames
Kurata, Gakuto
Willett, Daniel
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3595 - 3599
[35] Accelerating Data Loading in Deep Neural Network Training
Yang, Chih-Chieh
Cong, Guojing
2019 IEEE 26TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC), 2019, : 235 - 245
[36] Cooperative Initialization based Deep Neural Network Training
Singh, Pravendra
Varshney, Munender
Namboodiri, Vinay P.
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1130 - 1139
[37] Training and generalization of experimental values of ice scour event by a neural-network
Kioka, S
Kubouchi, A
Saeki, H
PROCEEDINGS OF THE THIRTEENTH (2003) INTERNATIONAL OFFSHORE AND POLAR ENGINEERING CONFERENCE, VOL 1, 2003, : 539 - 544
[38] Neural Spatial Interaction Models: Network Training, Model Complexity and Generalization Performance
Fischer, Manfred M.
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT IV, 2013, 7974 : 1 - 16
[39] Deep neural network for water/fat separation: Supervised training, unsupervised training, and no training
Jafari, Ramin
Spincemaille, Pascal
Zhang, Jinwei
Nguyen, Thanh D.
Luo, Xianfu
Cho, Junghun
Margolis, Daniel
Prince, Martin R.
Wang, Yi
MAGNETIC RESONANCE IN MEDICINE, 2021, 85 (04) : 2263 - 2277
[40] GENERALIZATION IN AN ANALOG NEURAL NETWORK
STARIOLO, DA
TAMARIT, FA
PHYSICAL REVIEW A, 1992, 46 (08): : 5249 - 5252

← 1 2 3 4 5 →