On the global convergence rate of the gradient descent method for functions with Hölder continuous gradients

被引:0
|
作者
Maryam Yashtini
机构
[1] Georgia Institute of Technology,School of Mathematics
来源
Optimization Letters | 2016年 / 10卷
关键词
Nonlinear programming; Gradient descent method; Global convergence; Hölder continuous gradient; Convergence rate; Upper complexity bound;
D O I
暂无
中图分类号
学科分类号
摘要
The gradient descent method minimizes an unconstrained nonlinear optimization problem with O(1/K)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(1/\sqrt{K})$$\end{document}, where K is the number of iterations performed by the gradient method. Traditionally, this analysis is obtained for smooth objective functions having Lipschitz continuous gradients. This paper aims to consider a more general class of nonlinear programming problems in which functions have Hölder continuous gradients. More precisely, for any function f in this class, denoted by CL1,ν\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathcal {C}}}^{1,\nu }_L$$\end{document}, there is a ν∈(0,1]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu \in (0,1]$$\end{document} and L>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L>0$$\end{document} such that for all x,y∈Rn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{x,y}\in {{\mathbb {R}}}^n$$\end{document} the relation ‖∇f(x)-∇f(y)‖≤L‖x-y‖ν\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert \le L \Vert \mathbf{x}-\mathbf{y}\Vert ^{\nu }$$\end{document} holds. We prove that the gradient descent method converges globally to a stationary point and exhibits a convergence rate of O(1/Kνν+1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(1/K^{\frac{\nu }{\nu +1}})$$\end{document} when the step-size is chosen properly, i.e., less than [ν+1L]1ν‖∇f(xk)‖1ν-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[\frac{\nu +1}{L}]^{\frac{1}{\nu }}\Vert \nabla f(\mathbf{x}_k)\Vert ^{\frac{1}{\nu }-1}$$\end{document}. Moreover, the algorithm employs O(1/ϵ1ν+1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(1/\epsilon ^{\frac{1}{\nu }+1})$$\end{document} number of calls to an oracle to find x¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bar{\mathbf{x}}}$$\end{document} such that ‖∇f(x¯)‖<ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \nabla f({{\bar{\mathbf{x}}}})\Vert <\epsilon $$\end{document}.
引用
收藏
页码:1361 / 1370
页数:9
相关论文
共 50 条
  • [21] A Conjugate Gradient Method With Sufficient Descent And Global Convergence For Unconstrained Nonlinear Optimization
    Liu, Hailin
    Cheng, Sui Sun
    Li, Xiaoyong
    APPLIED MATHEMATICS E-NOTES, 2011, 11 : 139 - 147
  • [22] Global convergence of a descent PRP type conjugate gradient method for nonconvex optimization
    Hu, Qingjie
    Zhang, Hongrun
    Chen, Yu
    APPLIED NUMERICAL MATHEMATICS, 2022, 173 : 38 - 50
  • [23] SURPASSING GRADIENT DESCENT PROVABLY: A CYCLIC INCREMENTAL METHOD WITH LINEAR CONVERGENCE RATE
    Mokhtari, Aryan
    Gurbuzbalaban, Mert
    Ribeiro, Alejandro
    SIAM JOURNAL ON OPTIMIZATION, 2018, 28 (02) : 1420 - 1447
  • [24] Global convergence of steepest descent for quadratic functions
    Zeng, ZG
    Huang, DS
    Wang, ZF
    INTELLIGENT DAA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 672 - 677
  • [25] Uniting Nesterov's Accelerated Gradient Descent and the Heavy Ball Method for Strongly Convex Functions with Exponential Convergence Rate
    Hustig-Schultz, Dawn M.
    Sanfelice, Ricardo G.
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 959 - 964
  • [26] Tight Convergence Rate of Gradient Descent for Eigenvalue Computation
    Ding, Qinghua
    Zhou, Kaiwen
    Cheng, James
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3276 - 3282
  • [27] On Hölder global optimization method using piecewise affine bounding functions
    Chahinaz Chenouf
    Mohamed Rahal
    Numerical Algorithms, 2023, 94 : 905 - 935
  • [28] A DESCENT FAMILY OF THREE-TERM CONJUGATE GRADIENT METHODS WITH GLOBAL CONVERGENCE FOR GENERAL FUNCTIONS
    Khoshsimaye-Bargard, Maryam
    Ashrafi, Ali
    PACIFIC JOURNAL OF OPTIMIZATION, 2022, 18 (03): : 529 - 543
  • [29] Convergence rates for the stochastic gradient descent method for non-convex objective functions
    Fehrman, Benjamin
    Gess, Benjamin
    Jentzen, Arnulf
    Journal of Machine Learning Research, 2020, 21
  • [30] Convergence Rates for the Stochastic Gradient Descent Method for Non-Convex Objective Functions
    Fehrman, Benjamin
    Gess, Benjamin
    Jentzen, Arnulf
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21