A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference

被引：0

作者：

Wei Yuan

Fei Hu

Liangfu Lu

机构：

[1] Tianjin University,School of Mathematics

来源：

Applied Intelligence | 2022年 / 52卷

关键词：

Optimization method; Deep learning; Stochastic gradient; Difference;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The adaptive optimization methods (such as AdaGrad, RMSProp, Adam, and RAdam) and non-adaptive optimization methods (such as SGD and SGD with momentum) have recently been used in deep learning. The former has the characteristics of fast convergence speed but low convergence accuracy. The convergence speed of the latter is relatively slow, but the convergence accuracy is high. We propose a new non-adaptive method, stochastic gradient descent with momentum and difference (SGD(MD)), which is based on the idea of difference. We make a difference between the adjacent mini-batch gradient, and then update exponential moving averages of the gradient variations. That is, the exponential moving averages of the gradient variation is updated, and then the accumulated mean variation is added to the current gradient so as to adjust the convergence direction of our algorithm and accelerate its convergence speed. Compared with other popular optimization methods, the experimental results show that by using the method of difference, our SGD(MD) is significantly superior to SGD(M) and close to and sometimes even better than the adaptive optimization method including Adam and RAdam.

引用

页码：3939 / 3953

页数：14

共 50 条

[21] Linear Convergence of Adaptive Stochastic Gradient Descent
Xie, Yuege
Wu, Xiaoxia
Ward, Rachel
arXiv, 2019,
[22] On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
Li, Xiaoyu
Orabona, Francesco
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[23] Nonlinear Optimization Method Based on Stochastic Gradient Descent for Fast Convergence
Watanabe, Takahiro
Iima, Hitoshi
2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 4198 - 4203
[24] Linear Convergence of Adaptive Stochastic Gradient Descent
Xie, Yuege
Wu, Xiaoxia
Ward, Rachel
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[25] Difference-enhanced adaptive momentum methods for non-convex stochastic optimization in image classification
Ouyang, Chen
Jian, Ailun
Zhao, Xiong
Yuan, Gonglin
DIGITAL SIGNAL PROCESSING, 2025, 161
[26] Gradient descent with adaptive momentum for active contour models
Liu, Guoqi
Zhou, Zhiheng
Zhong, Huiqiang
Xie, Shengli
IET COMPUTER VISION, 2014, 8 (04) : 287 - 298
[27] BACKPROPAGATION AND STOCHASTIC GRADIENT DESCENT METHOD
AMARI, S
NEUROCOMPUTING, 1993, 5 (4-5) : 185 - 196
[28] Stochastic gradient descent for optimization for nuclear systems
Austin Williams
Noah Walton
Austin Maryanski
Sandra Bogetic
Wes Hines
Vladimir Sobes
Scientific Reports, 13
[29] Ant colony optimization and stochastic gradient descent
Meuleau, N
Dorigo, M
ARTIFICIAL LIFE, 2002, 8 (02) : 103 - 121
[30] Stochastic gradient descent for wind farm optimization
Quick, Julian
Rethore, Pierre-Elouan
Pedersen, Mads Molgaard
Rodrigues, Rafael Valotta
Friis-Moller, Mikkel
WIND ENERGY SCIENCE, 2023, 8 (08) : 1235 - 1250

← 1 2 3 4 5 →