A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference

被引:0
|
作者
Wei Yuan
Fei Hu
Liangfu Lu
机构
[1] Tianjin University,School of Mathematics
来源
Applied Intelligence | 2022年 / 52卷
关键词
Optimization method; Deep learning; Stochastic gradient; Difference;
D O I
暂无
中图分类号
学科分类号
摘要
The adaptive optimization methods (such as AdaGrad, RMSProp, Adam, and RAdam) and non-adaptive optimization methods (such as SGD and SGD with momentum) have recently been used in deep learning. The former has the characteristics of fast convergence speed but low convergence accuracy. The convergence speed of the latter is relatively slow, but the convergence accuracy is high. We propose a new non-adaptive method, stochastic gradient descent with momentum and difference (SGD(MD)), which is based on the idea of difference. We make a difference between the adjacent mini-batch gradient, and then update exponential moving averages of the gradient variations. That is, the exponential moving averages of the gradient variation is updated, and then the accumulated mean variation is added to the current gradient so as to adjust the convergence direction of our algorithm and accelerate its convergence speed. Compared with other popular optimization methods, the experimental results show that by using the method of difference, our SGD(MD) is significantly superior to SGD(M) and close to and sometimes even better than the adaptive optimization method including Adam and RAdam.
引用
收藏
页码:3939 / 3953
页数:14
相关论文
共 50 条
  • [41] Stochastic Gradient Descent Method of Convolutional Neural Network Using Fractional-Order Momentum
    Kan T.
    Gao Z.
    Yang C.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2020, 33 (06): : 559 - 567
  • [42] A New Conjugate Gradient Method for Unconstrained Optimization with Sufficient Descent
    Yussoff, Nurul Hajar Mohd
    Mamat, Mustafa
    Rivaie, Mohd
    Mohd, Ismail
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MATHEMATICAL SCIENCES, 2014, 1602 : 514 - 519
  • [43] Adaptive Stochastic Gradient Descent (SGD) for erratic datasets
    Dagal, Idriss
    Tanrioven, Kursat
    Nayir, Ahmet
    Akin, Burak
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2025, 166
  • [44] Adaptive Stochastic Gradient Descent Optimisation for Image Registration
    Stefan Klein
    Josien P. W. Pluim
    Marius Staring
    Max A. Viergever
    International Journal of Computer Vision, 2009, 81
  • [45] Adaptive Stochastic Gradient Descent Optimisation for Image Registration
    Klein, Stefan
    Pluim, Josien P. W.
    Staring, Marius
    Viergever, Max A.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2009, 81 (03) : 227 - 239
  • [46] Numerical optimization of non-adaptive microphone arrays
    Goldin, AA
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 507 - 510
  • [47] Adaptive Stochastic Mirror Descent for Constrained Optimization
    Bayandina, Anastasia
    2017 CONSTRUCTIVE NONSMOOTH ANALYSIS AND RELATED TOPICS (DEDICATED TO THE MEMORY OF V.F. DEMYANOV) (CNSA), 2017, : 40 - 43
  • [48] A Heuristic Adaptive Fast Gradient Method in Stochastic Optimization Problems
    A. V. Ogal’tsov
    A. I. Tyurin
    Computational Mathematics and Mathematical Physics, 2020, 60 : 1108 - 1115
  • [49] A Heuristic Adaptive Fast Gradient Method in Stochastic Optimization Problems
    Ogal'tsov, A. V.
    Tyurin, A. I.
    COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2020, 60 (07) : 1108 - 1115
  • [50] The Minimization of Empirical Risk Through Stochastic Gradient Descent with Momentum Algorithms
    Chaudhuri, Arindam
    ARTIFICIAL INTELLIGENCE METHODS IN INTELLIGENT ALGORITHMS, 2019, 985 : 168 - 181