Linear Convergence of Adaptive Stochastic Gradient Descent

被引:0
|
作者
Xie, Yuege [1 ]
Wu, Xiaoxia [2 ]
Ward, Rachel [1 ,2 ]
机构
[1] UT Austin, Oden Inst, Austin, TX 78712 USA
[2] UT Austin, Dept Math, Austin, TX USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We prove that the norm version of the adaptive stochastic gradient method (AdaGradNorm) achieves a linear convergence rate for a subset of either strongly convex functions or non-convex functions that satisfy the Polyak-Lojasiewicz (PL) inequality. The paper introduces the notion of Restricted Uniform Inequality of Gradients (RUIG)-which is a measure of the balanced-ness of the stochastic gradient norms-to depict the landscape of a function. RUIG plays a key role in proving the robustness of AdaGrad-Norm to its hyper-parameter tuning in the stochastic setting. On top of RUIG, we develop a two-stage framework to prove the linear convergence of AdaGrad-Norm without knowing the parameters of the objective functions. This framework can likely be extended to other adaptive stepsize algorithms. The numerical experiments validate the theory and suggest future directions for improvement.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Adaptive Stochastic Gradient Descent (SGD) for erratic datasets
    Dagal, Idriss
    Tanrioven, Kursat
    Nayir, Ahmet
    Akin, Burak
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2025, 166
  • [32] Adaptive Stochastic Gradient Descent Optimisation for Image Registration
    Stefan Klein
    Josien P. W. Pluim
    Marius Staring
    Max A. Viergever
    International Journal of Computer Vision, 2009, 81
  • [33] Fast Convergence for Stochastic and Distributed Gradient Descent in the Interpolation Limit
    Mitra, Partha P.
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1890 - 1894
  • [34] Decentralized Asynchronous Stochastic Gradient Descent: Convergence Rate Analysis
    Bedi, Amrit Singh
    Pradhan, Hrusikesha
    Rajawat, Ketan
    2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 402 - 406
  • [35] Adaptive Stochastic Gradient Descent Optimisation for Image Registration
    Klein, Stefan
    Pluim, Josien P. W.
    Staring, Marius
    Viergever, Max A.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2009, 81 (03) : 227 - 239
  • [36] ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent
    Srinivasan, Vishwak
    Sankar, Adepu Ravi
    Balasubramanian, Vineeth N.
    PROCEEDINGS OF THE ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA (CODS-COMAD'18), 2018, : 249 - 256
  • [37] Adaptive Polyak Step-Size for Momentum Accelerated Stochastic Gradient Descent With General Convergence Guarantee
    Zhang, Jiawei
    Jin, Cheng
    Gu, Yuantao
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2025, 73 : 462 - 476
  • [38] Stochastic Gradient Descent for Linear Systems with Missing Data
    Ma, Anna
    Needell, Deanna
    NUMERICAL MATHEMATICS-THEORY METHODS AND APPLICATIONS, 2019, 12 (01) : 1 - 20
  • [39] STOCHASTIC CONVERGENCE PROPERTIES OF THE ADAPTIVE GRADIENT LATTICE
    SOHIE, GRL
    SIBUL, LH
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (01): : 102 - 107
  • [40] Global Convergence of Gradient Descent for Deep Linear Residual Networks
    Wu, Lei
    Wang, Qingcan
    Ma, Chao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32