Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks

被引:1
|
作者
Xiong, Xia [1 ]
Chen, Yong-Cong [1 ]
Shi, Chunxiao [1 ]
Ao, Ping [2 ]
机构
[1] Shanghai Univ, Shanghai Ctr Quantitat Life Sci, Phys Dept, Shanghai 200444, Peoples R China
[2] Sichuan Univ, Coll Biomed Engn, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Compendex;
D O I
10.1088/0256-307X/40/8/080202
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Stochastic gradient descent (SGD), a widely used algorithm in deep-learning neural networks, has attracted continuing research interests for the theoretical principles behind its success. A recent work reported an anomaly (inverse) relation between the variance of neural weights and the landscape flatness of the loss function driven under SGD [Feng Y and Tu Y Proc. Natl. Acad. Sci. USA 118 e2015617118 (2021)]. To investigate this seeming violation of statistical physics principle, the properties of SGD near fixed points are analyzed with a dynamic decomposition method. Our approach recovers the true "energy" function under which the universal Boltzmann distribution holds. It differs from the cost function in general and resolves the paradox raised by the the anomaly. The study bridges the gap between the classical statistical mechanics and the emerging discipline of artificial intelligence, with potential for better algorithms to the latter.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Bai-cun ZHOU
    Cong-ying HAN
    Tian-de GUO
    ActaMathematicaeApplicataeSinica, 2021, 37 (01) : 126 - 136
  • [42] Contrastive Learning in Random Neural Networks and its Relation to Gradient-Descent Learning
    Romariz, Alexandre
    Gelenbe, Erol
    COMPUTER AND INFORMATION SCIENCES II, 2012, : 511 - 517
  • [43] The Dynamics of Gradient Descent for Overparametrized Neural Networks
    Satpathi, Siddhartha
    Srikant, R.
    LEARNING FOR DYNAMICS AND CONTROL, VOL 144, 2021, 144
  • [44] Applying Gradient Descent in Convolutional Neural Networks
    Cui, Nan
    2ND INTERNATIONAL CONFERENCE ON MACHINE VISION AND INFORMATION TECHNOLOGY (CMVIT 2018), 2018, 1004
  • [45] Stochastic incremental gradient descent for estimation in sensor networks
    Ram, S. Sundhar
    Nedic, A.
    Veeravalli, V. V.
    CONFERENCE RECORD OF THE FORTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1-5, 2007, : 582 - 586
  • [46] Fractional-order stochastic gradient descent method with momentum and energy for deep neural networks
    Zhou, Xingwen
    You, Zhenghao
    Sun, Weiguo
    Zhao, Dongdong
    Yan, Shi
    NEURAL NETWORKS, 2025, 181
  • [47] An automatic learning rate decay strategy for stochastic gradient descent optimization methods in neural networks
    Wang, Kang
    Dou, Yong
    Sun, Tao
    Qiao, Peng
    Wen, Dong
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (10) : 7334 - 7355
  • [48] Efficient Optimization of Neural Networks for Predictive Hiring: An In-Depth Approach to Stochastic Gradient Descent
    Temsamani, Yassine Khallouk
    Achchab, Said
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 588 - 594
  • [49] One-pass stochastic gradient descent in overparametrized two-layer neural networks
    Xu, Jiaming
    Zhu, Hanjing
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [50] Hardness Prediction of 7003 Aluminum Alloy by Gradient Descent Algorithm in BP Artificial Neural Networks
    Ren, J. P.
    Song, R. G.
    HIGH PERFORMANCE STRUCTURES AND MATERIALS ENGINEERING, PTS 1 AND 2, 2011, 217-218 : 1458 - 1461