共 14 条
- [1] 35, 1, pp. 223-231, (2018)
- [2] 37, 9, pp. 1212-1217, (2015)
- [3] 49, 6, pp. 43-49
- [4] 49, 2, pp. 51-55
- [5] NESTEROV Y., A method of solving a convex programing problem with convergence rate[J], Soviet Mathematics Doklady, 27, 2, pp. 372-376, (1983)
- [6] DUCHI J, SINGER Y., Adaptive subgradient methods for online learning and stochastic optimization[J], Journal of Machine Learning Research, 12, 7, pp. 2121-2159, (2011)
- [7] ZEILER M D., ADADELTA:an adaptive learning rate method
- [8] TIELEMAN T,, HINTON G., RMSProp: divide the gradient by a running average of its recent magnitude[R], (2012)
- [9] KINGMA D,, BA J., Adam:a method for stochastic optimization[C], Proc of the 3rd International Conference on Learning Representations, pp. 1-15, (2015)
- [10] LOSHCHILOV I, Decoupled weight decay regularization[C], Proc of the 7th Int Conf for Learning Representations, pp. 1-19, (2019)