An improved adaptive momentum gradient descent algorithm

被引:0
|
作者
Jiang Z. [1 ]
Song J. [1 ]
Liu Y. [2 ]
机构
[1] School of Mathematics and Statistics, Changchun University of Science and Technology, Changchun
[2] CEC GienTech Technology Co. Ltd., Beijing
关键词
Adam algorithm; angle information; global convergence; gradient descent algorithm; machine learning; regret bound;
D O I
10.13245/j.hust.239004
中图分类号
学科分类号
摘要
To improve the poor global convergence of Adam algorithm,an AngleAdam algorithm with angular coefficients was proposed. The algorithm used the angle information between two continuous gradients to adaptively control the step size,which improved the problem of poor global convergence of Adam algorithm to a certain extent,and improved the optimization ability.By using the online learning framework,the convergence of the algorithm was analyzed from the perspective of regret bound,proving that the AngleAdam had sublinear regret.Based on the constructed three non-convex functions and the depth neural network model,the optimization ability of the AngleAdam algorithm was tested. Experimental results show that the algorithm can obtain better optimization results. © 2023 Huazhong University of Science and Technology. All rights reserved.
引用
收藏
页码:137 / 143
页数:6
相关论文
共 14 条
  • [1] 35, 1, pp. 223-231, (2018)
  • [2] 37, 9, pp. 1212-1217, (2015)
  • [3] 49, 6, pp. 43-49
  • [4] 49, 2, pp. 51-55
  • [5] NESTEROV Y., A method of solving a convex programing problem with convergence rate[J], Soviet Mathematics Doklady, 27, 2, pp. 372-376, (1983)
  • [6] DUCHI J, SINGER Y., Adaptive subgradient methods for online learning and stochastic optimization[J], Journal of Machine Learning Research, 12, 7, pp. 2121-2159, (2011)
  • [7] ZEILER M D., ADADELTA:an adaptive learning rate method
  • [8] TIELEMAN T,, HINTON G., RMSProp: divide the gradient by a running average of its recent magnitude[R], (2012)
  • [9] KINGMA D,, BA J., Adam:a method for stochastic optimization[C], Proc of the 3rd International Conference on Learning Representations, pp. 1-15, (2015)
  • [10] LOSHCHILOV I, Decoupled weight decay regularization[C], Proc of the 7th Int Conf for Learning Representations, pp. 1-19, (2019)