Inconsistency, Instability, and Generalization Gap of Deep Neural Network Training

被引：0

作者：

Johnson, Rie ^{[1
]}

Zhang, Tong ^{[2
,3
]}

机构：

[1] RJ Res Consulting, New York, NY 11215 USA

[2] HKUST, Hong Kong, Peoples R China

[3] Google Res, Mountain View, CA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

STABILITY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As deep neural networks are highly expressive, it is important to find solutions with small generalization gap (the difference between the performance on the training data and unseen data). Focusing on the stochastic nature of training, we first present a theoretical analysis in which the bound of generalization gap depends on what we call inconsistency and instability of model outputs, which can be estimated on unlabeled data. Our empirical study based on this analysis shows that instability and inconsistency are strongly predictive of generalization gap in various settings. In particular, our finding indicates that inconsistency is a more reliable indicator of generalization gap than the sharpness of the loss landscape. Furthermore, we show that algorithmic reduction of inconsistency leads to superior performance. The results also provide a theoretical basis for existing methods such as co-distillation and ensemble.

引用

页数：27

共 50 条

[21] Quantification on the Generalization Performance of Deep Neural Network with Tychonoff Separation Axioms
Pinto, Linu
Gopalan, Sasi
Balasubramaniam, P.
INFORMATION SCIENCES, 2022, 608 : 262 - 285
[22] A deep convolutional neural network for topology optimization with perceptible generalization ability
Wang, Dalei
Xiang, Cheng
Pan, Yue
Chen, Airong
Zhou, Xiaoyi
Zhang, Yiquan
ENGINEERING OPTIMIZATION, 2022, 54 (06) : 973 - 988
[23] Voltage Instability Prediction Using a Deep Recurrent Neural Network
Hagmar, Hannes
Tong, Lang
Eriksson, Robert
Tuan, Le Anh
IEEE TRANSACTIONS ON POWER SYSTEMS, 2021, 36 (01) : 17 - 27
[24] CHARACTERIZING COMBUSTION INSTABILITY USING DEEP CONVOLUTIONAL NEURAL NETWORK
Gangopadhyay, Tryambak
Locurto, Anthony
Boor, Paige
Michael, James B.
Sarkar, Soumik
PROCEEDINGS OF THE ASME 11TH ANNUAL DYNAMIC SYSTEMS AND CONTROL CONFERENCE, 2018, VOL 1, 2018,
[25] Towards Deep Neural Network Training on Encrypted Data
Nandakumar, Karthik
Ratha, Nalini
Pankanti, Sharath
Halevi, Shai
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 40 - 48
[26] Dedicated Deep Neural Network Architectures and Methods for Their Training
Rozycki, P.
Kolbusz, J.
Wilamowski, B. M.
INES 2015 - IEEE 19TH INTERNATIONAL CONFERENCE ON INTELLIGENT ENGINEERING SYSTEMS, 2015, : 73 - 78
[27] Universal mean-field upper bound for the generalization gap of deep neural networks
Ariosto, S.
Pacelli, R.
Ginelli, F.
Gherardi, M.
Rotondo, P.
PHYSICAL REVIEW E, 2022, 105 (06)
[28] MEMORY REDUCTION METHOD FOR DEEP NEURAL NETWORK TRAINING
Shirahata, Koichi
Tomita, Yasumoto
Ike, Atsushi
2016 IEEE 26TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2016,
[29] Distributed Deep Neural Network Training on Edge Devices
Benditkis, Daniel
Keren, Aviv
Mor-Yosef, Liron
Avidor, Tomer
Shoham, Neta
Tal-Israel, Nadav
SEC'19: PROCEEDINGS OF THE 4TH ACM/IEEE SYMPOSIUM ON EDGE COMPUTING, 2019, : 304 - 306
[30] Evolving a Deep Neural Network Training Time Estimator
Pinel, Frederic
Yin, Jian-xiong
Hundt, Christian
Kieffer, Emmanuel
Varrette, Sebastien
Bouvry, Pascal
See, Simon
OPTIMIZATION AND LEARNING, 2020, 1173 : 13 - 24

← 1 2 3 4 5 →