Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

被引:0
|
作者
Simsek, Berfin [1 ,2 ]
Ged, Francois [1 ]
Jacot, Arthur [1 ]
Spadaro, Francesco [1 ]
Hongler, Clement [1 ]
Gerstner, Wulfram [2 ]
Brea, Johanni [2 ]
机构
[1] Ecole Polytech Fed Lausanne, Chair Stat Field Theory, Lausanne, Switzerland
[2] Ecole Polytech Fed Lausanne, Lab Computat Neurosci, Lausanne, Switzerland
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷
基金
瑞士国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study how permutation symmetries in overparameterized multi-layer neural networks generate 'symmetry-induced' critical points. Assuming a network with L layers of minimal widths r(1)(*), ..., r(L-1)* reaches a zero-loss minimum at r(1)(*)! ... r(L-1)*! isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width r* + h =: m we explicitly describe the manifold of global minima: it consists of T(r*, m) affine subspaces of dimension at least h that are connected to one another. For a network of width m, we identify the number G(r, m) of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width r < r*. Via a combinatorial analysis, we derive closed-form formulas for T and G and show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small h) and vice versa in the vastly overparameterized regime (h >> r*). Our results provide new insights into the minimization of the non-convex loss function of overparameterized neural networks.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] How to Characterize The Landscape of Overparameterized Convolutional Neural Networks
    Gu, Yihong
    Zhang, Weizhong
    Fang, Cong
    Lee, Jason D.
    Zhang, Tong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [2] Learning Invariances in Neural Networks
    Benton, Gregory
    Finzi, Marc
    Izmailov, Pavel
    Wilson, Andrew Gordon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [3] Global Minima of Overparameterized Neural Networks
    Cooper, Yaim
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2021, 3 (02): : 676 - 691
  • [4] The Role of Regularization in Overparameterized Neural Networks
    Satpathi, Siddhartha
    Gupta, Harsh
    Liang, Shiyu
    Srikant, R.
    2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 4683 - 4688
  • [5] Mathematical Models of Overparameterized Neural Networks
    Fang, Cong
    Dong, Hanze
    Zhang, Tong
    PROCEEDINGS OF THE IEEE, 2021, 109 (05) : 683 - 703
  • [6] Encoding Involutory Invariances in Neural Networks
    Bhattacharya, Anwesh
    Mattheakis, Marios
    Protopapas, Pavlos
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [7] Convex Formulation of Overparameterized Deep Neural Networks
    Fang, Cong
    Gu, Yihong
    Zhang, Weizhong
    Zhang, Tong
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (08) : 5340 - 5352
  • [8] Overparameterized neural networks implement associative memory
    Radhakrishnan, Adityanarayanan
    Belkin, Mikhail
    Uhler, Caroline
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (44) : 27162 - 27170
  • [9] Dynamics and Perturbations of Overparameterized Linear Neural Networks
    de Oliveira, Arthur Castello B.
    Siami, Milad
    Sontag, Eduardo D.
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 7356 - 7361
  • [10] Deep Neural Networks with Efficient Guaranteed Invariances
    Rath, Matthias
    Condurache, Alexandru Paul
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206