Momentum Acceleration in the Individual Convergence of Nonsmooth Convex Optimization With Constraints

被引:11
|
作者
Tao, Wei [1 ]
Wu, Gao-Wei [2 ,3 ]
Tao, Qing [4 ]
机构
[1] Army Engn Univ PLA, Coll Command & Control Engn, Nanjing 210007, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[4] Army Acad Artillery & Air Def, Hefei 230031, Peoples R China
关键词
Heavy-ball (HB) methods; individual convergence; machine learning; momentum methods; nonsmooth optimization; sparsity;
D O I
10.1109/TNNLS.2020.3040325
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Momentum technique has recently emerged as an effective strategy in accelerating convergence of gradient descent (GD) methods and exhibits improved performance in deep learning as well as regularized learning. Typical momentum examples include Nesterov's accelerated gradient (NAG) and heavy-ball (HB) methods. However, so far, almost all the acceleration analyses are only limited to NAG, and a few investigations about the acceleration of HB are reported. In this article, we address the convergence about the last iterate of HB in nonsmooth optimizations with constraints, which we name individual convergence. This question is significant in machine learning, where the constraints are required to impose on the learning structure and the individual output is needed to effectively guarantee this structure while keeping an optimal rate of convergence. Specifically, we prove that HB achieves an individual convergence rate of O(1/root t), where t is the number of iterations. This indicates that both of the two momentum methods can accelerate the individual convergence of basic GD to be optimal. Even for the convergence of averaged iterates, our result avoids the disadvantages of the previous work in restricting the optimization problem to be unconstrained as well as limiting the performed number of iterations to be predefined. The novelty of convergence analysis presented in this article provides a clear understanding of how the HB momentum can accelerate the individual convergence and reveals more insights about the similarities and differences in getting the averaging and individual convergence rates. The derived optimal individual convergence is extended to regularized and stochastic settings, in which an individual solution can be produced by the projection-based operation. In contrast to the averaged output, the sparsity can be reduced remarkably without sacrificing the theoretical optimal rates. Several real experiments demonstrate the performance of HB momentum strategy.
引用
收藏
页码:1107 / 1118
页数:12
相关论文
共 50 条
  • [31] ON CONVERGENCE RATE OF DISTRIBUTED STOCHASTIC GRADIENT ALGORITHM FOR CONVEX OPTIMIZATION WITH INEQUALITY CONSTRAINTS
    Yuan, Deming
    Ho, Daniel W. C.
    Hong, Yiguang
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2016, 54 (05) : 2872 - 2892
  • [32] Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems
    Ye, Jane J.
    Yuan, Xiaoming
    Zeng, Shangzhi
    Zhang, Jin
    SET-VALUED AND VARIATIONAL ANALYSIS, 2021, 29 (04) : 803 - 837
  • [33] Smoothing Accelerated Proximal Gradient Method with Fast Convergence Rate for Nonsmooth Convex Optimization Beyond Differentiability
    Fan Wu
    Wei Bian
    Journal of Optimization Theory and Applications, 2023, 197 : 539 - 572
  • [34] Smoothing Accelerated Proximal Gradient Method with Fast Convergence Rate for Nonsmooth Convex Optimization Beyond Differentiability
    Wu, Fan
    Bian, Wei
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2023, 197 (02) : 539 - 572
  • [35] Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems
    Jane J. Ye
    Xiaoming Yuan
    Shangzhi Zeng
    Jin Zhang
    Set-Valued and Variational Analysis, 2021, 29 : 803 - 837
  • [36] Convergence Analysis of Some Methods for Minimizing a Nonsmooth Convex Function
    J. R. Birge
    L. Qi
    Z. Wei
    Journal of Optimization Theory and Applications, 1998, 97 : 357 - 383
  • [37] Convergence analysis of some methods for minimizing a nonsmooth convex function
    Birge, JR
    Qi, L
    Wei, Z
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1998, 97 (02) : 357 - 383
  • [38] Individual Convergence of NAG with Biased Gradient in Nonsmooth Cases
    Liu Y.-X.
    Cheng Y.-J.
    Tao Q.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (04): : 1051 - 1062
  • [39] Convergence analysis of some methods for minimizing a nonsmooth convex function
    Dept. of Indust. and Operations Eng., University of Michigan, Ann Arbor, MI, United States
    不详
    J. Optim. Theory Appl., 2 (357-383):
  • [40] Global Convergence of ADMM in Nonconvex Nonsmooth Optimization
    Yu Wang
    Wotao Yin
    Jinshan Zeng
    Journal of Scientific Computing, 2019, 78 : 29 - 63