A mini-batch stochastic conjugate gradient algorithm with variance reduction

被引：0

作者：

Caixia Kou

Han Yang

机构：

[1] Beijing University of Posts and Telecommunications,School of Science

来源：

Journal of Global Optimization | 2023年 / 87卷

关键词：

Deep learning; Empirical risk minimization; Stochastic conjugate gradient; Linear convergence;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Stochastic gradient descent method is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, there have been many explicit variance reduction methods for stochastic descent, such as SVRG Johnson and Zhang [Advances in neural information processing systems, (2013), pp. 315–323], SAG Roux et al. [Advances in neural information processing systems, (2012), pp. 2663–2671], SAGA Defazio et al. [Advances in neural information processing systems, (2014), pp. 1646–1654] and so on. Conjugate gradient method, which has the same computation cost with gradient descent method, is considered. In this paper, in the spirit of SAGA, we propose a stochastic conjugate gradient algorithm which we call SCGA. With the Fletcher and Reeves type choices, we prove a linear convergence rate for smooth and strongly convex functions. We experimentally demonstrate that SCGA converges faster than the popular SGD type algorithms for four machine learning models, which may be convex, nonconvex or nonsmooth. Solving regression problems, SCGA is competitive with CGVR, which is the only one stochastic conjugate gradient algorithm with variance reduction so far, as we know.

引用

页码：1009 / 1025

页数：16

共 50 条

[21] Adaptive Natural Gradient Method for Learning of Stochastic Neural Networks in Mini-Batch Mode
Park, Hyeyoung
Lee, Kwanyong
APPLIED SCIENCES-BASEL, 2019, 9 (21):
[22] Stronger Adversarial Attack: Using Mini-batch Gradient
Yu, Lin
Deng, Ting
Zhang, Wenxiang
Zeng, Zhigang
2020 12TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2020, : 364 - 370
[23] Gradient preconditioned mini-batch SGD for ridge regression
Zhang, Zhuan
Zhou, Shuisheng
Li, Dong
Yang, Ting
NEUROCOMPUTING, 2020, 413 : 284 - 293
[24] Mini-batch stochastic subgradient for functional constrained optimization
Singh, Nitesh Kumar
Necoara, Ion
Kungurtsev, Vyacheslav
OPTIMIZATION, 2024, 73 (07) : 2159 - 2185
[25] Scalable Hardware Accelerator for Mini-Batch Gradient Descent
Rasoori, Sandeep
Akella, Venkatesh
PROCEEDINGS OF THE 2018 GREAT LAKES SYMPOSIUM ON VLSI (GLSVLSI'18), 2018, : 159 - 164
[26] A Learning Algorithm with a Gradient Normalization and a Learning Rate Adaptation for the Mini-batch Type Learning
Ito, Daiki
Okamoto, Takashi
Koakutsu, Seiichi
2017 56TH ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS OF JAPAN (SICE), 2017, : 811 - 816
[27] AN OPTIMAL DESIGN OF RANDOM SURFACES IN SOLAR CELLS VIA MINI-BATCH STOCHASTIC GRADIENT APPROACH
Wang, Dan
Li, Qiang
Shen, Jihong
COMMUNICATIONS IN MATHEMATICAL SCIENCES, 2022, 20 (03) : 747 - 762
[28] Automatic Setting of Learning Rate and Mini-batch Size in Momentum and AdaM Stochastic Gradient Methods
Franchini, Giorgia
Porta, Federica
INTERNATIONAL CONFERENCE ON NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2022, ICNAAM-2022, 2024, 3094
[29] Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization
Ghadimi, Saeed
Lan, Guanghui
Zhang, Hongchao
MATHEMATICAL PROGRAMMING, 2016, 155 (1-2) : 267 - 305
[30] Efficient distance metric learning by adaptive sampling and mini-batch stochastic gradient descent (SGD)
Qian, Qi
Jin, Rong
Yi, Jinfeng
Zhang, Lijun
Zhu, Shenghuo
MACHINE LEARNING, 2015, 99 (03) : 353 - 372

← 1 2 3 4 5 →