Linear Convergence of Adaptive Stochastic Gradient Descent

被引:0
|
作者
Xie, Yuege [1 ]
Wu, Xiaoxia [2 ]
Ward, Rachel [1 ,2 ]
机构
[1] UT Austin, Oden Inst, Austin, TX 78712 USA
[2] UT Austin, Dept Math, Austin, TX USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We prove that the norm version of the adaptive stochastic gradient method (AdaGradNorm) achieves a linear convergence rate for a subset of either strongly convex functions or non-convex functions that satisfy the Polyak-Lojasiewicz (PL) inequality. The paper introduces the notion of Restricted Uniform Inequality of Gradients (RUIG)-which is a measure of the balanced-ness of the stochastic gradient norms-to depict the landscape of a function. RUIG plays a key role in proving the robustness of AdaGrad-Norm to its hyper-parameter tuning in the stochastic setting. On top of RUIG, we develop a two-stage framework to prove the linear convergence of AdaGrad-Norm without knowing the parameters of the objective functions. This framework can likely be extended to other adaptive stepsize algorithms. The numerical experiments validate the theory and suggest future directions for improvement.
引用
收藏
页数:10
相关论文
共 50 条
  • [11] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Bai-cun Zhou
    Cong-ying Han
    Tian-de Guo
    Acta Mathematicae Applicatae Sinica, English Series, 2021, 37 : 126 - 136
  • [12] Optimized convergence of stochastic gradient descent by weighted averaging
    Hagedorn, Melinda
    Jarre, Florian
    OPTIMIZATION METHODS & SOFTWARE, 2024, 39 (04): : 699 - 724
  • [13] Convergence analysis of distributed stochastic gradient descent with shuffling
    Meng, Qi
    Chen, Wei
    Wang, Yue
    Ma, Zhi-Ming
    Liu, Tie-Yan
    NEUROCOMPUTING, 2019, 337 : 46 - 57
  • [14] Fast Convergence Stochastic Parallel Gradient Descent Algorithm
    Hu Dongting
    Shen Wen
    Ma Wenchao
    Liu Xinyu
    Su Zhouping
    Zhu Huaxin
    Zhang Xiumei
    Que Lizhi
    Zhu Zhuowei
    Zhang Yixin
    Chen Guoqing
    Hu Lifa
    LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (12)
  • [15] On the Convergence of Local Stochastic Compositional Gradient Descent with Momentum
    Gao, Hongchang
    Li, Junyi
    Huang, Heng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [16] The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory
    Alistarh, Dan
    De Sa, Christopher
    Konstantinov, Nikola
    PODC'18: PROCEEDINGS OF THE 2018 ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING, 2018, : 169 - 177
  • [17] On the Convergence of Decentralized Stochastic Gradient Descent With Biased Gradients
    Jiang, Yiming
    Kang, Helei
    Liu, Jinlan
    Xu, Dongpo
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2025, 73 : 549 - 558
  • [18] CONVERGENCE OF RIEMANNIAN STOCHASTIC GRADIENT DESCENT ON HADAMARD MANIFOLD
    Sakai, Hiroyuki
    Iiduka, Hideaki
    PACIFIC JOURNAL OF OPTIMIZATION, 2024, 20 (04): : 743 - 767
  • [19] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Zhou, Bai-cun
    Han, Cong-ying
    Guo, Tian-de
    ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2021, 37 (01): : 126 - 136
  • [20] Convergence behavior of diffusion stochastic gradient descent algorithm
    Barani, Fatemeh
    Savadi, Abdorreza
    Yazdi, Hadi Sadoghi
    SIGNAL PROCESSING, 2021, 183