Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks

被引：1

作者：

Xiong, Xia ^{[1
]}

Chen, Yong-Cong ^{[1
]}

Shi, Chunxiao ^{[1
]}

Ao, Ping ^{[2
]}

机构：

[1] Shanghai Univ, Shanghai Ctr Quantitat Life Sci, Phys Dept, Shanghai 200444, Peoples R China

[2] Sichuan Univ, Coll Biomed Engn, Chengdu 610065, Peoples R China

来源：

CHINESE PHYSICS LETTERS | 2023年 / 40卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Compendex;

D O I：

10.1088/0256-307X/40/8/080202

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

Stochastic gradient descent (SGD), a widely used algorithm in deep-learning neural networks, has attracted continuing research interests for the theoretical principles behind its success. A recent work reported an anomaly (inverse) relation between the variance of neural weights and the landscape flatness of the loss function driven under SGD [Feng Y and Tu Y Proc. Natl. Acad. Sci. USA 118 e2015617118 (2021)]. To investigate this seeming violation of statistical physics principle, the properties of SGD near fixed points are analyzed with a dynamic decomposition method. Our approach recovers the true "energy" function under which the universal Boltzmann distribution holds. It differs from the cost function in general and resolves the paradox raised by the the anomaly. The study bridges the gap between the classical statistical mechanics and the emerging discipline of artificial intelligence, with potential for better algorithms to the latter.

引用

页数：5

共 50 条

[31] A novel methodology for anomaly detection in smart home networks via Fractional Stochastic Gradient Descent
Bajpai, Abhishek
Chaurasia, Divyansh
Tiwari, Naveen
COMPUTERS & ELECTRICAL ENGINEERING, 2024, 119
[32] Weight fluctuations in deep linear neural networks and a derivation of the inverse-variance flatness relation
Gross, Markus
Raulf, Arne P.
Rath, Christoph
PHYSICAL REVIEW RESEARCH, 2024, 6 (03):
[33] PRECISION AND APPROXIMATE FLATNESS IN ARTIFICIAL NEURAL NETWORKS
STINCHCOMBE, MB
NEURAL COMPUTATION, 1995, 7 (05) : 1021 - 1039
[34] Parameter calibration with stochastic gradient descent for interacting particle systems driven by neural networks
Goettlich, Simone
Totzeck, Claudia
MATHEMATICS OF CONTROL SIGNALS AND SYSTEMS, 2022, 34 (01) : 185 - 214
[35] Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks
An, Jing
Lu, Jianfeng
arXiv, 2023,
[36] Parameter calibration with stochastic gradient descent for interacting particle systems driven by neural networks
Simone Göttlich
Claudia Totzeck
Mathematics of Control, Signals, and Systems, 2022, 34 : 185 - 214
[37] Convergence of Stochastic Gradient Descent in Deep Neural Network
Bai-cun Zhou
Cong-ying Han
Tian-de Guo
Acta Mathematicae Applicatae Sinica, English Series, 2021, 37 : 126 - 136
[38] Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
Wang, Hongjian
Gurbuzbalaban, Mert
Zhu, Lingjiong
Simsekli, Umut
Erdogdu, Murat A.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[39] Communication-efficient Variance-reduced Stochastic Gradient Descent
Ghadikolaei, Hossein S.
Magnusson, Sindri
IFAC PAPERSONLINE, 2020, 53 (02): : 2648 - 2653
[40] Convergence of Stochastic Gradient Descent in Deep Neural Network
Zhou, Bai-cun
Han, Cong-ying
Guo, Tian-de
ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2021, 37 (01): : 126 - 136

← 1 2 3 4 5 →