A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference

被引：0

作者：

Wei Yuan

Fei Hu

Liangfu Lu

机构：

[1] Tianjin University,School of Mathematics

来源：

Applied Intelligence | 2022年 / 52卷

关键词：

Optimization method; Deep learning; Stochastic gradient; Difference;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The adaptive optimization methods (such as AdaGrad, RMSProp, Adam, and RAdam) and non-adaptive optimization methods (such as SGD and SGD with momentum) have recently been used in deep learning. The former has the characteristics of fast convergence speed but low convergence accuracy. The convergence speed of the latter is relatively slow, but the convergence accuracy is high. We propose a new non-adaptive method, stochastic gradient descent with momentum and difference (SGD(MD)), which is based on the idea of difference. We make a difference between the adjacent mini-batch gradient, and then update exponential moving averages of the gradient variations. That is, the exponential moving averages of the gradient variation is updated, and then the accumulated mean variation is added to the current gradient so as to adjust the convergence direction of our algorithm and accelerate its convergence speed. Compared with other popular optimization methods, the experimental results show that by using the method of difference, our SGD(MD) is significantly superior to SGD(M) and close to and sometimes even better than the adaptive optimization method including Adam and RAdam.

引用

页码：3939 / 3953

页数：14

共 50 条

[1] A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference
Yuan, Wei
Hu, Fei
Lu, Liangfu
APPLIED INTELLIGENCE, 2022, 52 (04) : 3939 - 3953
[2] ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent
Srinivasan, Vishwak
Sankar, Adepu Ravi
Balasubramanian, Vineeth N.
PROCEEDINGS OF THE ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA (CODS-COMAD'18), 2018, : 249 - 256
[3] Adaptive Stochastic Gradient Descent Method for Convex and Non-Convex Optimization
Chen, Ruijuan
Tang, Xiaoquan
Li, Xiuting
FRACTAL AND FRACTIONAL, 2022, 6 (12)
[4] The combination of particle swarm optimization and stochastic gradient descent with momentum
Chen, Chi-Hua
ASIA-PACIFIC JOURNAL OF CLINICAL ONCOLOGY, 2022, 18 : 132 - 132
[5] On the Hyperparameters in Stochastic Gradient Descent with Momentum
Shi, Bin
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[6] On the Generalization of Stochastic Gradient Descent with Momentum
Ramezani-Kebrya, Ali
Antonakopoulos, Kimon
Cevher, Volkan
Khisti, Ashish
Liang, Ben
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 56
[7] Adaptive Sampling for Incremental Optimization Using Stochastic Gradient Descent
Papa, Guillaume
Bianchi, Pascal
Clemencon, Stephan
ALGORITHMIC LEARNING THEORY, ALT 2015, 2015, 9355 : 317 - 331
[8] Stochastic Gradient Descent on a Tree: an Adaptive and Robust Approach to Stochastic Convex Optimization
Vakili, Sattar
Salgia, Sudeep
Zhao, Qing
2019 57TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2019, : 432 - 438
[9] Adaptive compensation of the effects of non-stationary thermal blooming based on the stochastic parallel gradient descent optimization method
Carhart, GW
Simer, GJ
Vorontsov, MA
ADVANCED WAVEFRONT CONTROL: METHODS, DEVICES, AND APPLICATIONS, 2003, 5162 : 28 - 36
[10] A stochastic gradient tracking algorithm with adaptive momentum for distributed optimization
Li, Yantao
Hu, Hanqing
Zhang, Keke
Lu, Qingguo
Deng, Shaojiang
Li, Huaqing
NEUROCOMPUTING, 2025, 637

← 1 2 3 4 5 →