Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks

被引：0

作者：

Bai, Zhiwei ^{[1
]}

Luo, Tao ^{[1
,2
]}

Xu, Zhi-Qin John ^{[1
]}

Zhang, Yaoyu ^{[1
,3
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Math Sci, Inst Nat Sci, Shanghai 200240, Peoples R China

[2] CMA Shanghai, Shanghai Artificial Intelligence Lab, Shanghai 200240, Peoples R China

[3] Shanghai Ctr Brain Sci & Brain Inspired Technol, Shanghai 200240, Peoples R China

来源：

CSIAM TRANSACTIONS ON APPLIED MATHEMATICS | 2024年 / 5卷 / 02期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Deep learning; loss landscape; embedding principle;

D O I：

10.4208/csiam-am.SO-2023-0020

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

In this work, we delve into the relationship between deep and shallow neural networks (NNs), focusing on the critical points of their loss landscapes. We discover an embedding principle in depth that loss landscape of an NN "contains" all critical points of the loss landscapes for shallower NNs. The key tool for our discovery is the critical lifting that maps any critical point of a network to critical manifolds of any deeper network while preserving the outputs. To investigate the practical implications of this principle, we conduct a series of numerical experiments. The results confirm that deep networks do encounter these lifted critical points during training, leading to similar training dynamics across varying network depths. We provide theoretical and empirical evidence that through the lifting operation, the lifted critical points exhibit increased degeneracy. This principle also provides insights into the optimization benefits of batch normalization and larger datasets, and enables practical applications like network layer pruning. Overall, our discovery of the embedding principle in depth uncovers the depth-wise hierarchical structure of deep learning loss landscape, which serves as a solid foundation for the further study about the role of depth for DNNs.

引用

页码：350 / 389

页数：40

共 50 条

[41] On Reproducing Semi-dense Depth Map Reconstruction using Deep Convolutional Neural Networks with Perceptual Loss
Makarov, Ilya
Maslov, Dmitrii
Gerasimova, Olga
Aliev, Vladimir
Korinevskaya, Alisa
Sharma, Ujjwal
Wang, Haoliang
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1080 - 1084
[42] CONTRASTIVE-CENTER LOSS FOR DEEP NEURAL NETWORKS
Qi, Ce
Su, Fei
2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 2851 - 2855
[43] Learning Depth From Single Images With Deep Neural Network Embedding Focal Length
He, Lei
Wang, Guanghui
Hu, Zhanyi
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (09) : 4676 - 4689
[44] Safety Analysis of Deep Neural Networks
Guidotti, Dario
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4887 - 4888
[45] Sensitivity Analysis of Deep Neural Networks
Shu, Hai
Zhu, Hongtu
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4943 - 4950
[46] Deep Neural Networks in Semantic Analysis
Averkin, Alexey
Yarushev, Sergey
10TH INTERNATIONAL CONFERENCE ON THEORY AND APPLICATION OF SOFT COMPUTING, COMPUTING WITH WORDS AND PERCEPTIONS - ICSCCW-2019, 2020, 1095 : 846 - 853
[47] Discriminant Analysis Deep Neural Networks
Li, Li
Doroslovacki, Milos
Loew, Murray H.
2019 53RD ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2019,
[48] Empirical loss landscape analysis in deep learning: A survey
Liang R.
Liu B.
Sun Y.
Xitong Gongcheng Lilun yu Shijian/System Engineering Theory and Practice, 2023, 43 (03): : 813 - 823
[49] Deep Learning Neural Networks and Bayesian Neural Networks in Data Analysis
Chernoded, Andrey
Dudko, Lev
Myagkov, Igor
Volkov, Petr
XXIII INTERNATIONAL WORKSHOP HIGH ENERGY PHYSICS AND QUANTUM FIELD THEORY (QFTHEP 2017), 2017, 158
[50] Principle Components Analysis based on a modified Neural Networks
Jin, Xiaoyi
Proceedings of the First International Conference on Information and Management Sciences, 2002, 1 : 109 - 111

← 1 2 3 4 5 →