Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks

被引:1
|
作者
Xiong, Xia [1 ]
Chen, Yong-Cong [1 ]
Shi, Chunxiao [1 ]
Ao, Ping [2 ]
机构
[1] Shanghai Univ, Shanghai Ctr Quantitat Life Sci, Phys Dept, Shanghai 200444, Peoples R China
[2] Sichuan Univ, Coll Biomed Engn, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Compendex;
D O I
10.1088/0256-307X/40/8/080202
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Stochastic gradient descent (SGD), a widely used algorithm in deep-learning neural networks, has attracted continuing research interests for the theoretical principles behind its success. A recent work reported an anomaly (inverse) relation between the variance of neural weights and the landscape flatness of the loss function driven under SGD [Feng Y and Tu Y Proc. Natl. Acad. Sci. USA 118 e2015617118 (2021)]. To investigate this seeming violation of statistical physics principle, the properties of SGD near fixed points are analyzed with a dynamic decomposition method. Our approach recovers the true "energy" function under which the universal Boltzmann distribution holds. It differs from the cost function in general and resolves the paradox raised by the the anomaly. The study bridges the gap between the classical statistical mechanics and the emerging discipline of artificial intelligence, with potential for better algorithms to the latter.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] A novel methodology for anomaly detection in smart home networks via Fractional Stochastic Gradient Descent
    Bajpai, Abhishek
    Chaurasia, Divyansh
    Tiwari, Naveen
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 119
  • [32] Weight fluctuations in deep linear neural networks and a derivation of the inverse-variance flatness relation
    Gross, Markus
    Raulf, Arne P.
    Rath, Christoph
    PHYSICAL REVIEW RESEARCH, 2024, 6 (03):
  • [33] PRECISION AND APPROXIMATE FLATNESS IN ARTIFICIAL NEURAL NETWORKS
    STINCHCOMBE, MB
    NEURAL COMPUTATION, 1995, 7 (05) : 1021 - 1039
  • [34] Parameter calibration with stochastic gradient descent for interacting particle systems driven by neural networks
    Goettlich, Simone
    Totzeck, Claudia
    MATHEMATICS OF CONTROL SIGNALS AND SYSTEMS, 2022, 34 (01) : 185 - 214
  • [35] Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks
    An, Jing
    Lu, Jianfeng
    arXiv, 2023,
  • [36] Parameter calibration with stochastic gradient descent for interacting particle systems driven by neural networks
    Simone Göttlich
    Claudia Totzeck
    Mathematics of Control, Signals, and Systems, 2022, 34 : 185 - 214
  • [37] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Bai-cun Zhou
    Cong-ying Han
    Tian-de Guo
    Acta Mathematicae Applicatae Sinica, English Series, 2021, 37 : 126 - 136
  • [38] Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
    Wang, Hongjian
    Gurbuzbalaban, Mert
    Zhu, Lingjiong
    Simsekli, Umut
    Erdogdu, Murat A.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [39] Communication-efficient Variance-reduced Stochastic Gradient Descent
    Ghadikolaei, Hossein S.
    Magnusson, Sindri
    IFAC PAPERSONLINE, 2020, 53 (02): : 2648 - 2653
  • [40] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Zhou, Bai-cun
    Han, Cong-ying
    Guo, Tian-de
    ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2021, 37 (01): : 126 - 136