Inconsistency, Instability, and Generalization Gap of Deep Neural Network Training

被引：0

作者：

Johnson, Rie ^{[1
]}

Zhang, Tong ^{[2
,3
]}

机构：

[1] RJ Res Consulting, New York, NY 11215 USA

[2] HKUST, Hong Kong, Peoples R China

[3] Google Res, Mountain View, CA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

STABILITY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As deep neural networks are highly expressive, it is important to find solutions with small generalization gap (the difference between the performance on the training data and unseen data). Focusing on the stochastic nature of training, we first present a theoretical analysis in which the bound of generalization gap depends on what we call inconsistency and instability of model outputs, which can be estimated on unlabeled data. Our empirical study based on this analysis shows that instability and inconsistency are strongly predictive of generalization gap in various settings. In particular, our finding indicates that inconsistency is a more reliable indicator of generalization gap than the sharpness of the loss landscape. Furthermore, we show that algorithmic reduction of inconsistency leads to superior performance. The results also provide a theoretical basis for existing methods such as co-distillation and ensemble.

引用

页数：27

共 50 条

[1] Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Chen, Jinghui
Zhou, Dongruo
Tang, Yiqi
Yang, Ziyan
Cao, Yuan
Gu, Quanquan
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3267 - 3275
[2] Rademacher dropout: An adaptive dropout for deep neural network via optimizing generalization gap
Wang, Haotian
Yang, Wenjing
Zhao, Zhenyu
Luo, Tingjin
Wang, Ji
Tang, Yuhua
NEUROCOMPUTING, 2019, 357 : 177 - 187
[3] Dependence of generalization of neural network on training set
Sun, Gongxing
Dai, Changjiang
Dai, Guiliang
Xiaoxing Weixing Jisuanji Xitong/Mini-Micro Systems, 1996, 17 (12): : 401 - 404
[4] Visualization in Deep Neural Network Training
Kollias, Stefanos
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2022, 31 (03)
[5] Training by Pairing Correlated Samples Improves Deep Network Generalization
Phan, Duc H.
Jones, Douglas L.
ELECTRONICS, 2024, 13 (21)
[6] A novel RBF neural network with fast training and accurate generalization
Wang, Lipo
Liu, Bing
Wan, Chunru
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2004, 3314 : 166 - 171
[7] A novel RBF neural network with fast training and accurate generalization
Wang, LP
Bing, L
Wan, CR
COMPUTATIONAL AND INFORMATION SCIENCE, PROCEEDINGS, 2004, 3314 : 166 - 171
[8] RazorNet: Adversarial Training and Noise Training on a Deep Neural Network Fooled by a Shallow Neural Network
Taheri, Shayan
Salem, Milad
Yuan, Jiann-Shiun
BIG DATA AND COGNITIVE COMPUTING, 2019, 3 (03) : 1 - 17
[9] Generalization Error Analysis: Deep Convolutional Neural Network in Mammography
Richter, Caleb D.
Samala, Ravi K.
Chan, Heang-Ping
Hadjiiski, Lubomir
Cha, Kenny
MEDICAL IMAGING 2018: COMPUTER-AIDED DIAGNOSIS, 2018, 10575
[10] Adaptive Approximation and Generalization of Deep Neural Network with Intrinsic Dimensionality
Nakada, Ryumei
Imaizumi, Masaaki
JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21

← 1 2 3 4 5 →