A Line Search Based Proximal Stochastic Gradient Algorithm with Dynamical Variance Reduction

被引：8

作者：

Franchini, Giorgia ^{[1
]}

Porta, Federica ^{[1
]}

Ruggiero, Valeria ^{[2
]}

Trombini, Ilaria ^{[2
,3
]}

机构：

[1] Univ Modena & Reggio Emilia, Dept Phys Informat & Math, Via Campi 213-B, I-41125 Modena, Italy

[2] Univ Ferrara, Dept Math & Comp Sci, Via Machiavelli 30, I-44121 Ferrara, Italy

[3] Univ Parma, Dept Math Phys & Comp Sci, Parco Area Sci 7-A, I-43124 Parma, Italy

来源：

JOURNAL OF SCIENTIFIC COMPUTING | 2023年 / 94卷 / 01期

关键词：

First order stochastic methods; Stochastic proximal methods; Machine learning; Green artificial intelligence; CONVERGENCE;

D O I：

10.1007/s10915-022-02084-3

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Many optimization problems arising from machine learning applications can be cast as the minimization of the sum of two functions: the first one typically represents the expected risk, and in practice it is replaced by the empirical risk, and the other one imposes a priori information on the solution. Since in general the first term is differentiable and the second one is convex, proximal gradient methods are very well suited to face such optimization problems. However, when dealing with large-scale machine learning issues, the computation of the full gradient of the differentiable term can be prohibitively expensive by making these algorithms unsuitable. For this reason, proximal stochastic gradient methods have been extensively studied in the optimization area in the last decades. In this paper we develop a proximal stochastic gradient algorithm which is based on two main ingredients. We indeed combine a proper technique to dynamically reduce the variance of the stochastic gradients along the iterative process with a descent condition in expectation for the objective function, aimed to fix the value for the steplength parameter at each iteration. For general objective functionals, the a.s. convergence of the limit points of the sequence generated by the proposed scheme to stationary points can be proved. For convex objective functionals, both the a.s. convergence of the whole sequence of the iterates to a minimum point and an O(1/k) convergence rate for the objective function values have been shown. The practical implementation of the proposed method does not need neither the computation of the exact gradient of the empirical risk during the iterations nor the tuning of an optimal value for the step length. An extensive numerical experimentation highlights that the proposed approach appears robust with respect to the setting of the hyper parameters and competitive compared to state-of-the-art methods.

引用

页数：35

共 50 条

[41] Variance-Based Modified Backward-Forward Algorithm with Line Search for Stochastic Variational Inequality Problems and Its Applications
Yang, Zhen-Ping
Wang, Yuliang
Lin, Gui-Hua
ASIA-PACIFIC JOURNAL OF OPERATIONAL RESEARCH, 2020, 37 (03)
[42] On the line-search gradient methods for stochastic optimization
Dvinskikh, Darina
Ogaltsov, Aleksandr
Gasnikov, Alexander
Dvurechensky, Pavel
Spokoiny, Vladimir
IFAC PAPERSONLINE, 2020, 53 (02): : 1715 - 1720
[43] Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm
Salim, Adil
Richtarik, Peter
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[44] Asynchronous Proximal Stochastic Gradient Algorithm for Composition Optimization Problems
Wang, Pengfei
Liu, Risheng
Zheng, Nenggan
Gong, Zhefeng
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 1633 - 1640
[45] A Distributed Stochastic Proximal-Gradient Algorithm for Composite Optimization
Niu, Youcheng
Li, Huaqing
Wang, Zheng
Lu, Qingguo
Xia, Dawen
Ji, Lianghao
IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2021, 8 (03): : 1383 - 1393
[46] Accelerated Stochastic Variance Reduction Gradient Algorithms for Robust Subspace Clustering
Liu, Hongying
Yang, Linlin
Zhang, Longge
Shang, Fanhua
Liu, Yuanyuan
Wang, Lijun
SENSORS, 2024, 24 (11)
[47] Stochastic distributed learning with gradient quantization and double-variance reduction
Horvath, Samuel
Kovalev, Dmitry
Mishchenko, Konstantin
Richtarik, Peter
Stich, Sebastian
OPTIMIZATION METHODS & SOFTWARE, 2023, 38 (01): : 91 - 106
[48] Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction
Dragomir, Radu-Alexandru
Even, Mathieu
Hendrikx, Hadrien
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[49] Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction
Zou, Difan
Xu, Pan
Gu, Quanquan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[50] Stochastic gradient Hamiltonian Monte Carlo with variance reduction for Bayesian inference
Li, Zhize
Zhang, Tianyi
Cheng, Shuyu
Zhu, Jun
Li, Jian
MACHINE LEARNING, 2019, 108 (8-9) : 1701 - 1727

← 1 2 3 4 5 →