A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference

被引:0
|
作者
Wei Yuan
Fei Hu
Liangfu Lu
机构
[1] Tianjin University,School of Mathematics
来源
Applied Intelligence | 2022年 / 52卷
关键词
Optimization method; Deep learning; Stochastic gradient; Difference;
D O I
暂无
中图分类号
学科分类号
摘要
The adaptive optimization methods (such as AdaGrad, RMSProp, Adam, and RAdam) and non-adaptive optimization methods (such as SGD and SGD with momentum) have recently been used in deep learning. The former has the characteristics of fast convergence speed but low convergence accuracy. The convergence speed of the latter is relatively slow, but the convergence accuracy is high. We propose a new non-adaptive method, stochastic gradient descent with momentum and difference (SGD(MD)), which is based on the idea of difference. We make a difference between the adjacent mini-batch gradient, and then update exponential moving averages of the gradient variations. That is, the exponential moving averages of the gradient variation is updated, and then the accumulated mean variation is added to the current gradient so as to adjust the convergence direction of our algorithm and accelerate its convergence speed. Compared with other popular optimization methods, the experimental results show that by using the method of difference, our SGD(MD) is significantly superior to SGD(M) and close to and sometimes even better than the adaptive optimization method including Adam and RAdam.
引用
收藏
页码:3939 / 3953
页数:14
相关论文
共 50 条
  • [1] A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference
    Yuan, Wei
    Hu, Fei
    Lu, Liangfu
    APPLIED INTELLIGENCE, 2022, 52 (04) : 3939 - 3953
  • [2] ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent
    Srinivasan, Vishwak
    Sankar, Adepu Ravi
    Balasubramanian, Vineeth N.
    PROCEEDINGS OF THE ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA (CODS-COMAD'18), 2018, : 249 - 256
  • [3] Adaptive Stochastic Gradient Descent Method for Convex and Non-Convex Optimization
    Chen, Ruijuan
    Tang, Xiaoquan
    Li, Xiuting
    FRACTAL AND FRACTIONAL, 2022, 6 (12)
  • [4] The combination of particle swarm optimization and stochastic gradient descent with momentum
    Chen, Chi-Hua
    ASIA-PACIFIC JOURNAL OF CLINICAL ONCOLOGY, 2022, 18 : 132 - 132
  • [5] On the Hyperparameters in Stochastic Gradient Descent with Momentum
    Shi, Bin
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [6] On the Generalization of Stochastic Gradient Descent with Momentum
    Ramezani-Kebrya, Ali
    Antonakopoulos, Kimon
    Cevher, Volkan
    Khisti, Ashish
    Liang, Ben
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 56
  • [7] Adaptive Sampling for Incremental Optimization Using Stochastic Gradient Descent
    Papa, Guillaume
    Bianchi, Pascal
    Clemencon, Stephan
    ALGORITHMIC LEARNING THEORY, ALT 2015, 2015, 9355 : 317 - 331
  • [8] Stochastic Gradient Descent on a Tree: an Adaptive and Robust Approach to Stochastic Convex Optimization
    Vakili, Sattar
    Salgia, Sudeep
    Zhao, Qing
    2019 57TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2019, : 432 - 438
  • [9] Adaptive compensation of the effects of non-stationary thermal blooming based on the stochastic parallel gradient descent optimization method
    Carhart, GW
    Simer, GJ
    Vorontsov, MA
    ADVANCED WAVEFRONT CONTROL: METHODS, DEVICES, AND APPLICATIONS, 2003, 5162 : 28 - 36
  • [10] A stochastic gradient tracking algorithm with adaptive momentum for distributed optimization
    Li, Yantao
    Hu, Hanqing
    Zhang, Keke
    Lu, Qingguo
    Deng, Shaojiang
    Li, Huaqing
    NEUROCOMPUTING, 2025, 637