Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks

被引:1
|
作者
Xiong, Xia [1 ]
Chen, Yong-Cong [1 ]
Shi, Chunxiao [1 ]
Ao, Ping [2 ]
机构
[1] Shanghai Univ, Shanghai Ctr Quantitat Life Sci, Phys Dept, Shanghai 200444, Peoples R China
[2] Sichuan Univ, Coll Biomed Engn, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Compendex;
D O I
10.1088/0256-307X/40/8/080202
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Stochastic gradient descent (SGD), a widely used algorithm in deep-learning neural networks, has attracted continuing research interests for the theoretical principles behind its success. A recent work reported an anomaly (inverse) relation between the variance of neural weights and the landscape flatness of the loss function driven under SGD [Feng Y and Tu Y Proc. Natl. Acad. Sci. USA 118 e2015617118 (2021)]. To investigate this seeming violation of statistical physics principle, the properties of SGD near fixed points are analyzed with a dynamic decomposition method. Our approach recovers the true "energy" function under which the universal Boltzmann distribution holds. It differs from the cost function in general and resolves the paradox raised by the the anomaly. The study bridges the gap between the classical statistical mechanics and the emerging discipline of artificial intelligence, with potential for better algorithms to the latter.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks
    熊霞
    陈永聪
    石春晓
    敖平
    Chinese Physics Letters, 2023, 40 (08) : 11 - 24
  • [2] Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks
    熊霞
    陈永聪
    石春晓
    敖平
    Chinese Physics Letters, 2023, (08) : 11 - 24
  • [3] The inverse variance-flatness relation in stochastic gradient descent is critical for finding flat minima
    Feng, Yu
    Tu, Yuhai
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2021, 118 (09)
  • [4] Calibrated Stochastic Gradient Descent for Convolutional Neural Networks
    Zhuo, Li'an
    Zhang, Baochang
    Chen, Chen
    Ye, Qixiang
    Liu, Jianzhuang
    Doermann, David
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9348 - 9355
  • [5] Variance Reduced Stochastic Gradient Descent with Neighbors
    Hofmann, Thomas
    Lucchi, Aurelien
    Lacoste-Julien, Simon
    McWilliams, Brian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [6] Stochastic gradient descent with variance reduction technique
    Zhang, Jinjing
    Hu, Fei
    Xu, Xiaofei
    Li, Li
    WEB INTELLIGENCE, 2018, 16 (03) : 187 - 194
  • [7] Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks
    Cui, Xiaodong
    Zhang, Wei
    Tuske, Zoltan
    Picheny, Michael
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [8] Optimizing Deep Neural Networks Through Neuroevolution With Stochastic Gradient Descent
    Zhang, Haichao
    Hao, Kuangrong
    Gao, Lei
    Wei, Bing
    Tang, Xuesong
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (01) : 111 - 121
  • [9] Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
    Cao, Yuan
    Gu, Quanquan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [10] Convergence of Hyperbolic Neural Networks Under Riemannian Stochastic Gradient Descent
    Whiting, Wes
    Wang, Bao
    Xin, Jack
    COMMUNICATIONS ON APPLIED MATHEMATICS AND COMPUTATION, 2024, 6 (02) : 1175 - 1188