Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks

被引：0

作者：

Bai, Zhiwei ^{[1
]}

Luo, Tao ^{[1
,2
]}

Xu, Zhi-Qin John ^{[1
]}

Zhang, Yaoyu ^{[1
,3
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Math Sci, Inst Nat Sci, Shanghai 200240, Peoples R China

[2] CMA Shanghai, Shanghai Artificial Intelligence Lab, Shanghai 200240, Peoples R China

[3] Shanghai Ctr Brain Sci & Brain Inspired Technol, Shanghai 200240, Peoples R China

来源：

CSIAM TRANSACTIONS ON APPLIED MATHEMATICS | 2024年 / 5卷 / 02期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Deep learning; loss landscape; embedding principle;

D O I：

10.4208/csiam-am.SO-2023-0020

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

In this work, we delve into the relationship between deep and shallow neural networks (NNs), focusing on the critical points of their loss landscapes. We discover an embedding principle in depth that loss landscape of an NN "contains" all critical points of the loss landscapes for shallower NNs. The key tool for our discovery is the critical lifting that maps any critical point of a network to critical manifolds of any deeper network while preserving the outputs. To investigate the practical implications of this principle, we conduct a series of numerical experiments. The results confirm that deep networks do encounter these lifted critical points during training, leading to similar training dynamics across varying network depths. We provide theoretical and empirical evidence that through the lifting operation, the lifted critical points exhibit increased degeneracy. This principle also provides insights into the optimization benefits of batch normalization and larger datasets, and enables practical applications like network layer pruning. Overall, our discovery of the embedding principle in depth uncovers the depth-wise hierarchical structure of deep learning loss landscape, which serves as a solid foundation for the further study about the role of depth for DNNs.

引用

页码：350 / 389

页数：40

共 50 条

[21] A Backdoor Embedding Method for Backdoor Detection in Deep Neural Networks
Liu, Meirong
Zheng, Hong
Liu, Qin
Xing, Xiaofei
Dai, Yinglong
UBIQUITOUS SECURITY, 2022, 1557 : 1 - 12
[22] Deep heterogeneous network embedding based on Siamese Neural Networks
Zhang, Chen
Tang, Zhouhua
Yu, Bin
Xie, Yu
Pan, Ke
NEUROCOMPUTING, 2020, 388 : 1 - 11
[23] A Sense Embedding of Deep Convolutional Neural Networks for Sentiment Classification
Cui, Zhijian
Shi, Xiaodong
Chen, Yidong
Guo, Yinmei
INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (11): : 71 - 79
[24] The Loss Surface of Deep and Wide Neural Networks
Quynh Nguyen
Hein, Matthias
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[25] Classification with Deep Neural Networks and Logistic Loss
Zhang, Zihan
Shi, Lei
Zhou, Ding-Xuan
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[26] SPEAKER ADAPTIVE TRAINING FOR DEEP NEURAL NETWORKS EMBEDDING LINEAR TRANSFORMATION NETWORKS
Ochiai, Tsubasa
Matsuda, Shigeki
Watanabe, Hideyuki
Lu, Xugang
Hori, Chiori
Katagiri, Shigeru
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4605 - 4609
[27] Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances
Simsek, Berfin
Ged, Francois
Jacot, Arthur
Spadaro, Francesco
Hongler, Clement
Gerstner, Wulfram
Brea, Johanni
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[28] Delving in the loss landscape to embed robust watermarks into neural networks
Tartaglione, Enzo
Grangetto, Marco
Cavagnino, Davide
Botta, Marco
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1243 - 1250
[29] Shallow Deep Learning: Embedding Verbatim K-Means in Deep Neural Networks
Du, Len
2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 194 - +
[30] Embedding-Based Speaker Adaptive Training of Deep Neural Networks
Cui, Xiaodong
Goel, Vaibhava
Saon, George
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 122 - 126

← 1 2 3 4 5 →