Adaptive Powerball Stochastic Conjugate Gradient for Large-Scale Learning

被引：3

作者：

Yang, Zhuang ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China

来源：

IEEE TRANSACTIONS ON BIG DATA | 2023年 / 9卷 / 06期

关键词：

Machine learning algorithms; Sensitivity; Machine learning; Ordinary differential equations; Information retrieval; Robustness; Computational complexity; Adaptive learning rate; conjugate gradient; large-scale learning; powerball function; stochastic optimization; QUASI-NEWTON METHOD;

D O I：

10.1109/TBDATA.2023.3300546

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The extreme success of stochastic optimization (SO) in large-scale machine learning problems, information retrieval, bioinformatics, etc., has been widely reported, especially in recent years. As an effective tactic, conjugate gradient (CG) has been gaining its popularity in accelerating SO algorithms. This paper develops a novel type of stochastic conjugate gradient descent (SCG) algorithms from the perspective of the Powerball strategy and the hypergradient descent (HD) technique. The crucial idea behind the resulting methods is inspired by pursuing the equilibrium of ordinary differential equations (ODEs). We elucidate the effect of the Powerball strategy in SCG algorithms. The introduction of HD, on the other side, makes the resulting methods work with an online learning rate. Meanwhile, we provide a comprehension of the theoretical results for the resulting algorithms under non-convex assumptions. As a byproduct, we bridge the gap between the learning rate and powered stochastic optimization (PSO) algorithms, which is still an open problem. Resorting to numerical experiments on numerous benchmark datasets, we test the parameter sensitivity of the proposed methods and demonstrate the superior performance of our new algorithms over state-of-the-art algorithms.

引用

页码：1598 / 1606

页数：9

共 50 条

[21] Stochastic variance reduced gradient with hyper-gradient for non-convex large-scale learning
Yang, Zhuang
APPLIED INTELLIGENCE, 2023, 53 (23) : 28627 - 28641
[22] Stochastic variance reduced gradient with hyper-gradient for non-convex large-scale learning
Zhuang Yang
Applied Intelligence, 2023, 53 : 28627 - 28641
[23] Large-Scale Stochastic Learning using GPUs
Parnell, Thomas
Dunner, Celestine
Atasu, Kubilay
Sifalakis, Manolis
Pozidis, Haris
2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 419 - 428
[24] Variance Counterbalancing for Stochastic Large-scale Learning
Lagari, Pola Lydia
Tsoukalas, Lefteri H.
Lagaris, Isaac E.
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2020, 29 (05)
[25] The Modified HZ Conjugate Gradient Algorithm for Large-Scale Nonsmooth Optimization
Yuan, Gonglin
Sheng, Zhou
Liu, Wenjie
PLOS ONE, 2016, 11 (10):
[26] Distributed preconditioned conjugate gradient algorithms for analysis of large-scale structures
Kwon, YH
Park, HS
STRUCTURAL ENGINEERING AND MECHANICS, VOLS 1 AND 2, 1999, : 173 - 178
[27] The new spectral conjugate gradient method for large-scale unconstrained optimisation
Li Wang
Mingyuan Cao
Funa Xing
Yueting Yang
Journal of Inequalities and Applications, 2020
[28] A new accelerated conjugate gradient method for large-scale unconstrained optimization
Chen, Yuting
Cao, Mingyuan
Yang, Yueting
JOURNAL OF INEQUALITIES AND APPLICATIONS, 2019, 2019 (01)
[29] TESTING DIFFERENT CONJUGATE GRADIENT METHODS FOR LARGE-SCALE UNCONSTRAINED OPTIMIZATION
Yu-hong Dai(LSEC
JournalofComputationalMathematics, 2003, (03) : 311 - 320
[30] Sufficient descent conjugate gradient methods for large-scale optimization problems
Zheng, Xiuyun
Liu, Hongwei
Lu, Aiguo
INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2011, 88 (16) : 3436 - 3447

← 1 2 3 4 5 →