An improved adaptive momentum gradient descent algorithm

被引：0

作者：

Jiang Z. ^{[1
]}

Song J. ^{[1
]}

Liu Y. ^{[2
]}

机构：

[1] School of Mathematics and Statistics, Changchun University of Science and Technology, Changchun

[2] CEC GienTech Technology Co. Ltd., Beijing

来源：

Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition) | 2023年 / 51卷 / 05期

关键词：

Adam algorithm; angle information; global convergence; gradient descent algorithm; machine learning; regret bound;

D O I：

10.13245/j.hust.239004

中图分类号：

学科分类号：

摘要：

To improve the poor global convergence of Adam algorithm，an AngleAdam algorithm with angular coefficients was proposed． The algorithm used the angle information between two continuous gradients to adaptively control the step size，which improved the problem of poor global convergence of Adam algorithm to a certain extent，and improved the optimization ability．By using the online learning framework，the convergence of the algorithm was analyzed from the perspective of regret bound，proving that the AngleAdam had sublinear regret．Based on the constructed three non-convex functions and the depth neural network model，the optimization ability of the AngleAdam algorithm was tested． Experimental results show that the algorithm can obtain better optimization results. © 2023 Huazhong University of Science and Technology. All rights reserved.

引用

页码：137 / 143

页数：6

共 14 条

[1] 35, 1, pp. 223-231, (2018)
[2] 37, 9, pp. 1212-1217, (2015)
[3] 49, 6, pp. 43-49
[4] 49, 2, pp. 51-55
[5] NESTEROV Y．, A method of solving a convex programing problem with convergence rate[J], Soviet Mathematics Doklady, 27, 2, pp. 372-376, (1983)
[6] DUCHI J, SINGER Y．, Adaptive subgradient methods for online learning and stochastic optimization[J], Journal of Machine Learning Research, 12, 7, pp. 2121-2159, (2011)
[7] ZEILER M D．, ADADELTA：an adaptive learning rate method
[8] TIELEMAN T，, HINTON G．, RMSProp： divide the gradient by a running average of its recent magnitude[R], (2012)
[9] KINGMA D，, BA J．, Adam：a method for stochastic optimization[C], Proc of the 3rd International Conference on Learning Representations, pp. 1-15, (2015)
[10] LOSHCHILOV I, Decoupled weight decay regularization[C], Proc of the 7th Int Conf for Learning Representations, pp. 1-19, (2019)

← 1 2 →