A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference

被引：0

作者：

Wei Yuan

Fei Hu

Liangfu Lu

机构：

[1] Tianjin University,School of Mathematics

来源：

Applied Intelligence | 2022年 / 52卷

关键词：

Optimization method; Deep learning; Stochastic gradient; Difference;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The adaptive optimization methods (such as AdaGrad, RMSProp, Adam, and RAdam) and non-adaptive optimization methods (such as SGD and SGD with momentum) have recently been used in deep learning. The former has the characteristics of fast convergence speed but low convergence accuracy. The convergence speed of the latter is relatively slow, but the convergence accuracy is high. We propose a new non-adaptive method, stochastic gradient descent with momentum and difference (SGD(MD)), which is based on the idea of difference. We make a difference between the adjacent mini-batch gradient, and then update exponential moving averages of the gradient variations. That is, the exponential moving averages of the gradient variation is updated, and then the accumulated mean variation is added to the current gradient so as to adjust the convergence direction of our algorithm and accelerate its convergence speed. Compared with other popular optimization methods, the experimental results show that by using the method of difference, our SGD(MD) is significantly superior to SGD(M) and close to and sometimes even better than the adaptive optimization method including Adam and RAdam.

引用

页码：3939 / 3953

页数：14

共 50 条

[31] Stochastic Chebyshev Gradient Descent for Spectral Optimization
Han, Insu
Avron, Haim
Shin, Jinwoo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[32] Design of Momentum Fractional Stochastic Gradient Descent for Recommender Systems
Khan, Zeshan Aslam
Zubair, Syed
Alquhayz, Hani
Azeem, Muhammad
Ditta, Allah
IEEE ACCESS, 2019, 7 : 179575 - 179590
[33] Stochastic gradient descent for optimization for nuclear systems
Williams, Austin
Walton, Noah
Maryanski, Austin
Bogetic, Sandra
Hines, Wes
Sobes, Vladimir
SCIENTIFIC REPORTS, 2023, 13 (01)
[34] A stochastic variance reduced gradient method with adaptive step for stochastic optimization
Li, Jing
Xue, Dan
Liu, Lei
Qi, Rulei
OPTIMAL CONTROL APPLICATIONS & METHODS, 2024, 45 (03): : 1327 - 1342
[35] An adaptive enhancement method based on stochastic parallel gradient descent of glioma image
Wang, Hongfei
Peng, Xinhao
Ma, ShiQing
Wang, Shuai
Xu, Chuan
Yang, Ping
IET IMAGE PROCESSING, 2023, 17 (14) : 3976 - 3985
[36] Accelerate Distributed Stochastic Descent for Nonconvex Optimization with Momentum
Cong, Guojing
Liu, Tianyi
2020 IEEE/ACM WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2020) AND WORKSHOP ON ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR SCIENTIFIC APPLICATIONS (AI4S 2020), 2020, : 29 - 39
[37] Adaptive tuning of fuzzy membership functions for non-linear optimization using gradient descent method
Vishnupad, PS
Shin, YC
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 1999, 7 (01) : 13 - 25
[38] Adaptive Polyak Step-Size for Momentum Accelerated Stochastic Gradient Descent With General Convergence Guarantee
Zhang, Jiawei
Jin, Cheng
Gu, Yuantao
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2025, 73 : 462 - 476
[39] Bandwidth estimation for adaptive optical systems based on stochastic parallel gradient descent optimization
Yu, M
Vorontsov, MA
ADVANCED WAVEFRONT CONTROL: METHODS, DEVICES, AND APPLICATIONS II, 2004, 5553 : 189 - 199
[40] Fractional-order stochastic gradient descent method with momentum and energy for deep neural networks
Zhou, Xingwen
You, Zhenghao
Sun, Weiguo
Zhao, Dongdong
Yan, Shi
NEURAL NETWORKS, 2025, 181

← 1 2 3 4 5 →