A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference

被引:0
|
作者
Wei Yuan
Fei Hu
Liangfu Lu
机构
[1] Tianjin University,School of Mathematics
来源
Applied Intelligence | 2022年 / 52卷
关键词
Optimization method; Deep learning; Stochastic gradient; Difference;
D O I
暂无
中图分类号
学科分类号
摘要
The adaptive optimization methods (such as AdaGrad, RMSProp, Adam, and RAdam) and non-adaptive optimization methods (such as SGD and SGD with momentum) have recently been used in deep learning. The former has the characteristics of fast convergence speed but low convergence accuracy. The convergence speed of the latter is relatively slow, but the convergence accuracy is high. We propose a new non-adaptive method, stochastic gradient descent with momentum and difference (SGD(MD)), which is based on the idea of difference. We make a difference between the adjacent mini-batch gradient, and then update exponential moving averages of the gradient variations. That is, the exponential moving averages of the gradient variation is updated, and then the accumulated mean variation is added to the current gradient so as to adjust the convergence direction of our algorithm and accelerate its convergence speed. Compared with other popular optimization methods, the experimental results show that by using the method of difference, our SGD(MD) is significantly superior to SGD(M) and close to and sometimes even better than the adaptive optimization method including Adam and RAdam.
引用
收藏
页码:3939 / 3953
页数:14
相关论文
共 50 条
  • [31] Stochastic Chebyshev Gradient Descent for Spectral Optimization
    Han, Insu
    Avron, Haim
    Shin, Jinwoo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [32] Design of Momentum Fractional Stochastic Gradient Descent for Recommender Systems
    Khan, Zeshan Aslam
    Zubair, Syed
    Alquhayz, Hani
    Azeem, Muhammad
    Ditta, Allah
    IEEE ACCESS, 2019, 7 : 179575 - 179590
  • [33] Stochastic gradient descent for optimization for nuclear systems
    Williams, Austin
    Walton, Noah
    Maryanski, Austin
    Bogetic, Sandra
    Hines, Wes
    Sobes, Vladimir
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [34] A stochastic variance reduced gradient method with adaptive step for stochastic optimization
    Li, Jing
    Xue, Dan
    Liu, Lei
    Qi, Rulei
    OPTIMAL CONTROL APPLICATIONS & METHODS, 2024, 45 (03): : 1327 - 1342
  • [35] An adaptive enhancement method based on stochastic parallel gradient descent of glioma image
    Wang, Hongfei
    Peng, Xinhao
    Ma, ShiQing
    Wang, Shuai
    Xu, Chuan
    Yang, Ping
    IET IMAGE PROCESSING, 2023, 17 (14) : 3976 - 3985
  • [36] Accelerate Distributed Stochastic Descent for Nonconvex Optimization with Momentum
    Cong, Guojing
    Liu, Tianyi
    2020 IEEE/ACM WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2020) AND WORKSHOP ON ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR SCIENTIFIC APPLICATIONS (AI4S 2020), 2020, : 29 - 39
  • [37] Adaptive tuning of fuzzy membership functions for non-linear optimization using gradient descent method
    Vishnupad, PS
    Shin, YC
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 1999, 7 (01) : 13 - 25
  • [38] Adaptive Polyak Step-Size for Momentum Accelerated Stochastic Gradient Descent With General Convergence Guarantee
    Zhang, Jiawei
    Jin, Cheng
    Gu, Yuantao
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2025, 73 : 462 - 476
  • [39] Bandwidth estimation for adaptive optical systems based on stochastic parallel gradient descent optimization
    Yu, M
    Vorontsov, MA
    ADVANCED WAVEFRONT CONTROL: METHODS, DEVICES, AND APPLICATIONS II, 2004, 5553 : 189 - 199
  • [40] Fractional-order stochastic gradient descent method with momentum and energy for deep neural networks
    Zhou, Xingwen
    You, Zhenghao
    Sun, Weiguo
    Zhao, Dongdong
    Yan, Shi
    NEURAL NETWORKS, 2025, 181