A Line Search Based Proximal Stochastic Gradient Algorithm with Dynamical Variance Reduction

被引:8
|
作者
Franchini, Giorgia [1 ]
Porta, Federica [1 ]
Ruggiero, Valeria [2 ]
Trombini, Ilaria [2 ,3 ]
机构
[1] Univ Modena & Reggio Emilia, Dept Phys Informat & Math, Via Campi 213-B, I-41125 Modena, Italy
[2] Univ Ferrara, Dept Math & Comp Sci, Via Machiavelli 30, I-44121 Ferrara, Italy
[3] Univ Parma, Dept Math Phys & Comp Sci, Parco Area Sci 7-A, I-43124 Parma, Italy
关键词
First order stochastic methods; Stochastic proximal methods; Machine learning; Green artificial intelligence; CONVERGENCE;
D O I
10.1007/s10915-022-02084-3
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Many optimization problems arising from machine learning applications can be cast as the minimization of the sum of two functions: the first one typically represents the expected risk, and in practice it is replaced by the empirical risk, and the other one imposes a priori information on the solution. Since in general the first term is differentiable and the second one is convex, proximal gradient methods are very well suited to face such optimization problems. However, when dealing with large-scale machine learning issues, the computation of the full gradient of the differentiable term can be prohibitively expensive by making these algorithms unsuitable. For this reason, proximal stochastic gradient methods have been extensively studied in the optimization area in the last decades. In this paper we develop a proximal stochastic gradient algorithm which is based on two main ingredients. We indeed combine a proper technique to dynamically reduce the variance of the stochastic gradients along the iterative process with a descent condition in expectation for the objective function, aimed to fix the value for the steplength parameter at each iteration. For general objective functionals, the a.s. convergence of the limit points of the sequence generated by the proposed scheme to stationary points can be proved. For convex objective functionals, both the a.s. convergence of the whole sequence of the iterates to a minimum point and an O(1/k) convergence rate for the objective function values have been shown. The practical implementation of the proposed method does not need neither the computation of the exact gradient of the empirical risk during the iterations nor the tuning of an optimal value for the step length. An extensive numerical experimentation highlights that the proposed approach appears robust with respect to the setting of the hyper parameters and competitive compared to state-of-the-art methods.
引用
收藏
页数:35
相关论文
共 50 条
  • [41] Variance-Based Modified Backward-Forward Algorithm with Line Search for Stochastic Variational Inequality Problems and Its Applications
    Yang, Zhen-Ping
    Wang, Yuliang
    Lin, Gui-Hua
    ASIA-PACIFIC JOURNAL OF OPERATIONAL RESEARCH, 2020, 37 (03)
  • [42] On the line-search gradient methods for stochastic optimization
    Dvinskikh, Darina
    Ogaltsov, Aleksandr
    Gasnikov, Alexander
    Dvurechensky, Pavel
    Spokoiny, Vladimir
    IFAC PAPERSONLINE, 2020, 53 (02): : 1715 - 1720
  • [43] Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm
    Salim, Adil
    Richtarik, Peter
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [44] Asynchronous Proximal Stochastic Gradient Algorithm for Composition Optimization Problems
    Wang, Pengfei
    Liu, Risheng
    Zheng, Nenggan
    Gong, Zhefeng
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 1633 - 1640
  • [45] A Distributed Stochastic Proximal-Gradient Algorithm for Composite Optimization
    Niu, Youcheng
    Li, Huaqing
    Wang, Zheng
    Lu, Qingguo
    Xia, Dawen
    Ji, Lianghao
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2021, 8 (03): : 1383 - 1393
  • [46] Accelerated Stochastic Variance Reduction Gradient Algorithms for Robust Subspace Clustering
    Liu, Hongying
    Yang, Linlin
    Zhang, Longge
    Shang, Fanhua
    Liu, Yuanyuan
    Wang, Lijun
    SENSORS, 2024, 24 (11)
  • [47] Stochastic distributed learning with gradient quantization and double-variance reduction
    Horvath, Samuel
    Kovalev, Dmitry
    Mishchenko, Konstantin
    Richtarik, Peter
    Stich, Sebastian
    OPTIMIZATION METHODS & SOFTWARE, 2023, 38 (01): : 91 - 106
  • [48] Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction
    Dragomir, Radu-Alexandru
    Even, Mathieu
    Hendrikx, Hadrien
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [49] Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction
    Zou, Difan
    Xu, Pan
    Gu, Quanquan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [50] Stochastic gradient Hamiltonian Monte Carlo with variance reduction for Bayesian inference
    Li, Zhize
    Zhang, Tianyi
    Cheng, Shuyu
    Zhu, Jun
    Li, Jian
    MACHINE LEARNING, 2019, 108 (8-9) : 1701 - 1727