On the global convergence rate of the gradient descent method for functions with Hölder continuous gradients

被引:0
|
作者
Maryam Yashtini
机构
[1] Georgia Institute of Technology,School of Mathematics
来源
Optimization Letters | 2016年 / 10卷
关键词
Nonlinear programming; Gradient descent method; Global convergence; Hölder continuous gradient; Convergence rate; Upper complexity bound;
D O I
暂无
中图分类号
学科分类号
摘要
The gradient descent method minimizes an unconstrained nonlinear optimization problem with O(1/K)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(1/\sqrt{K})$$\end{document}, where K is the number of iterations performed by the gradient method. Traditionally, this analysis is obtained for smooth objective functions having Lipschitz continuous gradients. This paper aims to consider a more general class of nonlinear programming problems in which functions have Hölder continuous gradients. More precisely, for any function f in this class, denoted by CL1,ν\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathcal {C}}}^{1,\nu }_L$$\end{document}, there is a ν∈(0,1]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu \in (0,1]$$\end{document} and L>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L>0$$\end{document} such that for all x,y∈Rn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{x,y}\in {{\mathbb {R}}}^n$$\end{document} the relation ‖∇f(x)-∇f(y)‖≤L‖x-y‖ν\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert \le L \Vert \mathbf{x}-\mathbf{y}\Vert ^{\nu }$$\end{document} holds. We prove that the gradient descent method converges globally to a stationary point and exhibits a convergence rate of O(1/Kνν+1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(1/K^{\frac{\nu }{\nu +1}})$$\end{document} when the step-size is chosen properly, i.e., less than [ν+1L]1ν‖∇f(xk)‖1ν-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[\frac{\nu +1}{L}]^{\frac{1}{\nu }}\Vert \nabla f(\mathbf{x}_k)\Vert ^{\frac{1}{\nu }-1}$$\end{document}. Moreover, the algorithm employs O(1/ϵ1ν+1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(1/\epsilon ^{\frac{1}{\nu }+1})$$\end{document} number of calls to an oracle to find x¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bar{\mathbf{x}}}$$\end{document} such that ‖∇f(x¯)‖<ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \nabla f({{\bar{\mathbf{x}}}})\Vert <\epsilon $$\end{document}.
引用
收藏
页码:1361 / 1370
页数:9
相关论文
共 50 条
  • [41] Convergence Rates of Zeroth Order Gradient Descent for Łojasiewicz Functions
    Wang, Tianyu
    Feng, Yasong
    INFORMS JOURNAL ON COMPUTING, 2024, 36 (06) : 1611 - 1633
  • [42] Some sufficient descent conjugate gradient methods and their global convergence
    Min Li
    Aiping Qu
    Computational and Applied Mathematics, 2014, 33 : 333 - 347
  • [43] Some sufficient descent conjugate gradient methods and their global convergence
    Li, Min
    Qu, Aiping
    COMPUTATIONAL & APPLIED MATHEMATICS, 2014, 33 (02): : 333 - 347
  • [44] The global convergence of a new conjugate descent method
    Chen, Yuan-yuan
    Wang, Zhuo-ping
    Proceedings of the Second International Conference on Game Theory and Applications, 2007, : 22 - 25
  • [45] Global Convergence of Gradient Descent for Deep Linear Residual Networks
    Wu, Lei
    Wang, Qingcan
    Ma, Chao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [46] ON THE GLOBAL CONVERGENCE OF RANDOMIZED COORDINATE GRADIENT DESCENT FOR NONCONVEX OPTIMIZATION
    Chen, Ziang
    Li, Yingzhou
    Lu, Jianfeng
    SIAM JOURNAL ON OPTIMIZATION, 2023, 33 (02) : 713 - 738
  • [47] Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation
    Zhang, Dejiao
    Balzano, Laura
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 1460 - 1468
  • [48] Beyond convexity-Contraction and global convergence of gradient descent
    Wensing, Patrick M.
    Slotine, Jean-Jacques
    PLOS ONE, 2020, 15 (08):
  • [49] On the Convergence of Stochastic Compositional Gradient Descent Ascent Method
    Gao, Hongchang
    Wang, Xiaoqian
    Luo, Lei
    Shi, Xinghua
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2389 - 2395
  • [50] Convergence diagnostics for stochastic gradient descent with constant learning rate
    Chee, Jerry
    Toulis, Panos
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84