On the global convergence rate of the gradient descent method for functions with Hölder continuous gradients

被引:0
|
作者
Maryam Yashtini
机构
[1] Georgia Institute of Technology,School of Mathematics
来源
Optimization Letters | 2016年 / 10卷
关键词
Nonlinear programming; Gradient descent method; Global convergence; Hölder continuous gradient; Convergence rate; Upper complexity bound;
D O I
暂无
中图分类号
学科分类号
摘要
The gradient descent method minimizes an unconstrained nonlinear optimization problem with O(1/K)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(1/\sqrt{K})$$\end{document}, where K is the number of iterations performed by the gradient method. Traditionally, this analysis is obtained for smooth objective functions having Lipschitz continuous gradients. This paper aims to consider a more general class of nonlinear programming problems in which functions have Hölder continuous gradients. More precisely, for any function f in this class, denoted by CL1,ν\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathcal {C}}}^{1,\nu }_L$$\end{document}, there is a ν∈(0,1]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu \in (0,1]$$\end{document} and L>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L>0$$\end{document} such that for all x,y∈Rn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{x,y}\in {{\mathbb {R}}}^n$$\end{document} the relation ‖∇f(x)-∇f(y)‖≤L‖x-y‖ν\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert \le L \Vert \mathbf{x}-\mathbf{y}\Vert ^{\nu }$$\end{document} holds. We prove that the gradient descent method converges globally to a stationary point and exhibits a convergence rate of O(1/Kνν+1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(1/K^{\frac{\nu }{\nu +1}})$$\end{document} when the step-size is chosen properly, i.e., less than [ν+1L]1ν‖∇f(xk)‖1ν-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[\frac{\nu +1}{L}]^{\frac{1}{\nu }}\Vert \nabla f(\mathbf{x}_k)\Vert ^{\frac{1}{\nu }-1}$$\end{document}. Moreover, the algorithm employs O(1/ϵ1ν+1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(1/\epsilon ^{\frac{1}{\nu }+1})$$\end{document} number of calls to an oracle to find x¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bar{\mathbf{x}}}$$\end{document} such that ‖∇f(x¯)‖<ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \nabla f({{\bar{\mathbf{x}}}})\Vert <\epsilon $$\end{document}.
引用
收藏
页码:1361 / 1370
页数:9
相关论文
共 50 条
  • [31] Continuous families of Hölder functions that are not of bounded variation
    Hugo H. Torriani
    Acta Mathematica Hungarica, 2004, 104 : 71 - 95
  • [32] A sufficient descent Liu-Storey conjugate gradient method and its global convergence
    Li, Min
    Qu, Aiping
    OPTIMIZATION, 2015, 64 (09) : 1919 - 1934
  • [33] Global Convergence of the Modified CD Nonlinear Conjugate Gradient Method With Sufficient Descent Property
    Dai, Zhifeng
    2010 CMSA OVERALL UNITED PLANNING SYMPOSIUM (OUPS 2010), 2010, : 38 - 40
  • [34] A sufficient descent three-term conjugate gradient method and its global convergence
    Lotfi, Mina
    Hosseini, S. Mohammad
    FILOMAT, 2024, 38 (12) : 4101 - 4115
  • [35] A Sufficient Descent HS Conjugate Gradient Method and Its Global Convergence for Unconstrained Optimization
    Xu, Chunling
    Sun, Zhongbo
    PROCEEDINGS OF THE 2012 24TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2012, : 889 - 892
  • [36] A New Coefficient of the Conjugate Gradient Method with the Sufficient Descent Condition and Global Convergence Properties
    Malik, Maulana
    Mamat, Mustafa
    Abas, Siti Sabariah
    Sulaiman, Ibrahim Mohammed
    Sukono
    ENGINEERING LETTERS, 2020, 28 (03) : 704 - 714
  • [37] A Sufficient Descent Hybrid Conjugate Gradient Method and Its Global Convergence for Unconstrained Optimization
    Sun, Zhongbo
    Zhu, Tianxiao
    Gao, Haiyin
    PROCEEDINGS OF THE 2012 24TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2012, : 735 - 739
  • [38] Convergence of the modified Halley’s method for multiple zeros under Hölder continuous derivative
    Weihong Bi
    Hongmin Ren
    Qingbiao Wu
    Numerical Algorithms, 2011, 58 : 497 - 512
  • [39] Convergence of Rothe's Method in Hölder Spaces
    N. Kikuchi
    J. Kačur
    Applications of Mathematics, 2003, 48 (5) : 353 - 365
  • [40] A modified scaled conjugate gradient method with global convergence for nonconvex functions
    Babaie-Kafaki, Saman
    Ghanbari, Reza
    BULLETIN OF THE BELGIAN MATHEMATICAL SOCIETY-SIMON STEVIN, 2014, 21 (03) : 465 - 477