A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference

被引:0
|
作者
Wei Yuan
Fei Hu
Liangfu Lu
机构
[1] Tianjin University,School of Mathematics
来源
Applied Intelligence | 2022年 / 52卷
关键词
Optimization method; Deep learning; Stochastic gradient; Difference;
D O I
暂无
中图分类号
学科分类号
摘要
The adaptive optimization methods (such as AdaGrad, RMSProp, Adam, and RAdam) and non-adaptive optimization methods (such as SGD and SGD with momentum) have recently been used in deep learning. The former has the characteristics of fast convergence speed but low convergence accuracy. The convergence speed of the latter is relatively slow, but the convergence accuracy is high. We propose a new non-adaptive method, stochastic gradient descent with momentum and difference (SGD(MD)), which is based on the idea of difference. We make a difference between the adjacent mini-batch gradient, and then update exponential moving averages of the gradient variations. That is, the exponential moving averages of the gradient variation is updated, and then the accumulated mean variation is added to the current gradient so as to adjust the convergence direction of our algorithm and accelerate its convergence speed. Compared with other popular optimization methods, the experimental results show that by using the method of difference, our SGD(MD) is significantly superior to SGD(M) and close to and sometimes even better than the adaptive optimization method including Adam and RAdam.
引用
收藏
页码:3939 / 3953
页数:14
相关论文
共 50 条
  • [21] Linear Convergence of Adaptive Stochastic Gradient Descent
    Xie, Yuege
    Wu, Xiaoxia
    Ward, Rachel
    arXiv, 2019,
  • [22] On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
    Li, Xiaoyu
    Orabona, Francesco
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [23] Nonlinear Optimization Method Based on Stochastic Gradient Descent for Fast Convergence
    Watanabe, Takahiro
    Iima, Hitoshi
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 4198 - 4203
  • [24] Linear Convergence of Adaptive Stochastic Gradient Descent
    Xie, Yuege
    Wu, Xiaoxia
    Ward, Rachel
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [25] Difference-enhanced adaptive momentum methods for non-convex stochastic optimization in image classification
    Ouyang, Chen
    Jian, Ailun
    Zhao, Xiong
    Yuan, Gonglin
    DIGITAL SIGNAL PROCESSING, 2025, 161
  • [26] Gradient descent with adaptive momentum for active contour models
    Liu, Guoqi
    Zhou, Zhiheng
    Zhong, Huiqiang
    Xie, Shengli
    IET COMPUTER VISION, 2014, 8 (04) : 287 - 298
  • [27] BACKPROPAGATION AND STOCHASTIC GRADIENT DESCENT METHOD
    AMARI, S
    NEUROCOMPUTING, 1993, 5 (4-5) : 185 - 196
  • [28] Stochastic gradient descent for optimization for nuclear systems
    Austin Williams
    Noah Walton
    Austin Maryanski
    Sandra Bogetic
    Wes Hines
    Vladimir Sobes
    Scientific Reports, 13
  • [29] Ant colony optimization and stochastic gradient descent
    Meuleau, N
    Dorigo, M
    ARTIFICIAL LIFE, 2002, 8 (02) : 103 - 121
  • [30] Stochastic gradient descent for wind farm optimization
    Quick, Julian
    Rethore, Pierre-Elouan
    Pedersen, Mads Molgaard
    Rodrigues, Rafael Valotta
    Friis-Moller, Mikkel
    WIND ENERGY SCIENCE, 2023, 8 (08) : 1235 - 1250