On the global convergence rate of the gradient descent method for functions with Hölder continuous gradients

被引:0
|
作者
Maryam Yashtini
机构
[1] Georgia Institute of Technology,School of Mathematics
来源
Optimization Letters | 2016年 / 10卷
关键词
Nonlinear programming; Gradient descent method; Global convergence; Hölder continuous gradient; Convergence rate; Upper complexity bound;
D O I
暂无
中图分类号
学科分类号
摘要
The gradient descent method minimizes an unconstrained nonlinear optimization problem with O(1/K)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(1/\sqrt{K})$$\end{document}, where K is the number of iterations performed by the gradient method. Traditionally, this analysis is obtained for smooth objective functions having Lipschitz continuous gradients. This paper aims to consider a more general class of nonlinear programming problems in which functions have Hölder continuous gradients. More precisely, for any function f in this class, denoted by CL1,ν\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathcal {C}}}^{1,\nu }_L$$\end{document}, there is a ν∈(0,1]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu \in (0,1]$$\end{document} and L>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L>0$$\end{document} such that for all x,y∈Rn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{x,y}\in {{\mathbb {R}}}^n$$\end{document} the relation ‖∇f(x)-∇f(y)‖≤L‖x-y‖ν\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert \le L \Vert \mathbf{x}-\mathbf{y}\Vert ^{\nu }$$\end{document} holds. We prove that the gradient descent method converges globally to a stationary point and exhibits a convergence rate of O(1/Kνν+1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(1/K^{\frac{\nu }{\nu +1}})$$\end{document} when the step-size is chosen properly, i.e., less than [ν+1L]1ν‖∇f(xk)‖1ν-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[\frac{\nu +1}{L}]^{\frac{1}{\nu }}\Vert \nabla f(\mathbf{x}_k)\Vert ^{\frac{1}{\nu }-1}$$\end{document}. Moreover, the algorithm employs O(1/ϵ1ν+1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(1/\epsilon ^{\frac{1}{\nu }+1})$$\end{document} number of calls to an oracle to find x¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bar{\mathbf{x}}}$$\end{document} such that ‖∇f(x¯)‖<ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \nabla f({{\bar{\mathbf{x}}}})\Vert <\epsilon $$\end{document}.
引用
收藏
页码:1361 / 1370
页数:9
相关论文
共 50 条
  • [1] On the global convergence rate of the gradient descent method for functions with Holder continuous gradients
    Yashtini, Maryam
    OPTIMIZATION LETTERS, 2016, 10 (06) : 1361 - 1370
  • [2] The global convergence of a descent PRP conjugate gradient method
    Li, Min
    Feng, Heying
    Liu, Jianguo
    COMPUTATIONAL & APPLIED MATHEMATICS, 2012, 31 (01): : 59 - 83
  • [3] Global convergence of a descent nonlinear conjugate gradient method
    Li, Xiaoyong
    Liu, Hailin
    ICMS2010: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON MODELLING AND SIMULATION, VOL 1: ENGINEERING COMPUTATION AND FINITE ELEMENT ANALYSIS, 2010, : 79 - 84
  • [4] The global convergence of a descent PRP conjugate gradient method
    Li, Min
    Feng, Heying
    Liu, Jianguo
    Computational and Applied Mathematics, 2012, 31 (01) : 59 - 83
  • [5] A new descent memory gradient method and its global convergence
    Sun, Min
    Bai, Qingguo
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2011, 24 (04) : 784 - 794
  • [6] A NEW DESCENT MEMORY GRADIENT METHOD AND ITS GLOBAL CONVERGENCE
    Min SUN Department of Mathematics and Information Science
    JournalofSystemsScience&Complexity, 2011, 24 (04) : 784 - 794
  • [7] A sufficient descent conjugate gradient method and its global convergence
    Cheng, Yunlong
    Mou, Qiong
    Pan, Xianbing
    Yao, Shengwei
    OPTIMIZATION METHODS & SOFTWARE, 2016, 31 (03): : 577 - 590
  • [8] A new descent memory gradient method and its global convergence
    Min Sun
    Qingguo Bai
    Journal of Systems Science and Complexity, 2011, 24 : 784 - 794
  • [9] THE RATE OF CONVERGENCE OF THE 2-STEP GRADIENT DESCENT METHOD
    VOSKOBOINIKOV, SP
    SENICHENKOV, YB
    TSUKERMAN, IA
    USSR COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 1983, 23 (05): : 131 - 133
  • [10] On the Convergence of Decentralized Stochastic Gradient Descent With Biased Gradients
    Jiang, Yiming
    Kang, Helei
    Liu, Jinlan
    Xu, Dongpo
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2025, 73 : 549 - 558