Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks

被引:0
|
作者
Bai, Zhiwei [1 ]
Luo, Tao [1 ,2 ]
Xu, Zhi-Qin John [1 ]
Zhang, Yaoyu [1 ,3 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Math Sci, Inst Nat Sci, Shanghai 200240, Peoples R China
[2] CMA Shanghai, Shanghai Artificial Intelligence Lab, Shanghai 200240, Peoples R China
[3] Shanghai Ctr Brain Sci & Brain Inspired Technol, Shanghai 200240, Peoples R China
来源
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Deep learning; loss landscape; embedding principle;
D O I
10.4208/csiam-am.SO-2023-0020
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this work, we delve into the relationship between deep and shallow neural networks (NNs), focusing on the critical points of their loss landscapes. We discover an embedding principle in depth that loss landscape of an NN "contains" all critical points of the loss landscapes for shallower NNs. The key tool for our discovery is the critical lifting that maps any critical point of a network to critical manifolds of any deeper network while preserving the outputs. To investigate the practical implications of this principle, we conduct a series of numerical experiments. The results confirm that deep networks do encounter these lifted critical points during training, leading to similar training dynamics across varying network depths. We provide theoretical and empirical evidence that through the lifting operation, the lifted critical points exhibit increased degeneracy. This principle also provides insights into the optimization benefits of batch normalization and larger datasets, and enables practical applications like network layer pruning. Overall, our discovery of the embedding principle in depth uncovers the depth-wise hierarchical structure of deep learning loss landscape, which serves as a solid foundation for the further study about the role of depth for DNNs.
引用
收藏
页码:350 / 389
页数:40
相关论文
共 50 条
  • [1] Embedding Principle of Loss Landscape of Deep Neural Networks
    Zhang, Yaoyu
    Zhang, Zhongwang
    Luo, Tao
    Xu, Zhi-Qin John
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis
    Achour, El Mehdi
    Malgouyres, Francois
    Gerchinovitz, Sebastien
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 76
  • [3] Loss Function Dynamics and Landscape for Deep Neural Networks Trained with Quadratic Loss
    M. S. Nakhodnov
    M. S. Kodryan
    E. M. Lobacheva
    D. S. Vetrov
    Doklady Mathematics, 2022, 106 : S43 - S62
  • [4] Loss Function Dynamics and Landscape for Deep Neural Networks Trained with Quadratic Loss
    Nakhodnov, M. S.
    Kodryan, M. S.
    Lobacheva, E. M.
    Vetrov, D. S.
    DOKLADY MATHEMATICS, 2022, 106 (SUPPL 1) : S43 - S62
  • [5] Better Loss Landscape Visualization for Deep Neural Networks with Trajectory Information
    Ding, Ruiqi
    Li, Tao
    Huang, Xiaolin
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
  • [6] Jamming transition as a paradigm to understand the loss landscape of deep neural networks
    Geiger, Mario
    Spigler, Stefano
    d'Ascoli, Stephane
    Sagun, Levent
    Baity-Jesi, Marco
    Biroli, Giulio
    Wyart, Matthieu
    PHYSICAL REVIEW E, 2019, 100 (01)
  • [7] Embedding Watermarks into Deep Neural Networks
    Uchida, Yusuke
    Nagai, Yuki
    Sakazawa, Shigeyuki
    Satoh, Shin'ichi
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 274 - 282
  • [8] An In-Depth Analysis of Distributed Training of Deep Neural Networks
    Ko, Yunyong
    Choi, Kibong
    Seo, Jiwon
    Kim, Sang-Wook
    2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 994 - 1003
  • [9] Landscape Classification with Deep Neural Networks
    Buscombe, Daniel
    Ritchie, Andrew C.
    GEOSCIENCES, 2018, 8 (07)
  • [10] Progressive principle component analysis for compressing deep convolutional neural networks
    Zhou, Jing
    Qi, Haobo
    Chen, Yu
    Wang, Hansheng
    NEUROCOMPUTING, 2021, 440 : 197 - 206