A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference

被引：0

作者：

Wei Yuan

Fei Hu

Liangfu Lu

机构：

[1] Tianjin University,School of Mathematics

来源：

Applied Intelligence | 2022年 / 52卷

关键词：

Optimization method; Deep learning; Stochastic gradient; Difference;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The adaptive optimization methods (such as AdaGrad, RMSProp, Adam, and RAdam) and non-adaptive optimization methods (such as SGD and SGD with momentum) have recently been used in deep learning. The former has the characteristics of fast convergence speed but low convergence accuracy. The convergence speed of the latter is relatively slow, but the convergence accuracy is high. We propose a new non-adaptive method, stochastic gradient descent with momentum and difference (SGD(MD)), which is based on the idea of difference. We make a difference between the adjacent mini-batch gradient, and then update exponential moving averages of the gradient variations. That is, the exponential moving averages of the gradient variation is updated, and then the accumulated mean variation is added to the current gradient so as to adjust the convergence direction of our algorithm and accelerate its convergence speed. Compared with other popular optimization methods, the experimental results show that by using the method of difference, our SGD(MD) is significantly superior to SGD(M) and close to and sometimes even better than the adaptive optimization method including Adam and RAdam.

引用

页码：3939 / 3953

页数：14

共 50 条

[41] Stochastic Gradient Descent Method of Convolutional Neural Network Using Fractional-Order Momentum
Kan T.
Gao Z.
Yang C.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2020, 33 (06): : 559 - 567
[42] A New Conjugate Gradient Method for Unconstrained Optimization with Sufficient Descent
Yussoff, Nurul Hajar Mohd
Mamat, Mustafa
Rivaie, Mohd
Mohd, Ismail
PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MATHEMATICAL SCIENCES, 2014, 1602 : 514 - 519
[43] Adaptive Stochastic Gradient Descent (SGD) for erratic datasets
Dagal, Idriss
Tanrioven, Kursat
Nayir, Ahmet
Akin, Burak
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2025, 166
[44] Adaptive Stochastic Gradient Descent Optimisation for Image Registration
Stefan Klein
Josien P. W. Pluim
Marius Staring
Max A. Viergever
International Journal of Computer Vision, 2009, 81
[45] Adaptive Stochastic Gradient Descent Optimisation for Image Registration
Klein, Stefan
Pluim, Josien P. W.
Staring, Marius
Viergever, Max A.
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2009, 81 (03) : 227 - 239
[46] Numerical optimization of non-adaptive microphone arrays
Goldin, AA
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 507 - 510
[47] Adaptive Stochastic Mirror Descent for Constrained Optimization
Bayandina, Anastasia
2017 CONSTRUCTIVE NONSMOOTH ANALYSIS AND RELATED TOPICS (DEDICATED TO THE MEMORY OF V.F. DEMYANOV) (CNSA), 2017, : 40 - 43
[48] A Heuristic Adaptive Fast Gradient Method in Stochastic Optimization Problems
A. V. Ogal’tsov
A. I. Tyurin
Computational Mathematics and Mathematical Physics, 2020, 60 : 1108 - 1115
[49] A Heuristic Adaptive Fast Gradient Method in Stochastic Optimization Problems
Ogal'tsov, A. V.
Tyurin, A. I.
COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2020, 60 (07) : 1108 - 1115
[50] The Minimization of Empirical Risk Through Stochastic Gradient Descent with Momentum Algorithms
Chaudhuri, Arindam
ARTIFICIAL INTELLIGENCE METHODS IN INTELLIGENT ALGORITHMS, 2019, 985 : 168 - 181

← 1 2 3 4 5 →