Adaptive Powerball Stochastic Conjugate Gradient for Large-Scale Learning

被引:3
|
作者
Yang, Zhuang [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
关键词
Machine learning algorithms; Sensitivity; Machine learning; Ordinary differential equations; Information retrieval; Robustness; Computational complexity; Adaptive learning rate; conjugate gradient; large-scale learning; powerball function; stochastic optimization; QUASI-NEWTON METHOD;
D O I
10.1109/TBDATA.2023.3300546
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The extreme success of stochastic optimization (SO) in large-scale machine learning problems, information retrieval, bioinformatics, etc., has been widely reported, especially in recent years. As an effective tactic, conjugate gradient (CG) has been gaining its popularity in accelerating SO algorithms. This paper develops a novel type of stochastic conjugate gradient descent (SCG) algorithms from the perspective of the Powerball strategy and the hypergradient descent (HD) technique. The crucial idea behind the resulting methods is inspired by pursuing the equilibrium of ordinary differential equations (ODEs). We elucidate the effect of the Powerball strategy in SCG algorithms. The introduction of HD, on the other side, makes the resulting methods work with an online learning rate. Meanwhile, we provide a comprehension of the theoretical results for the resulting algorithms under non-convex assumptions. As a byproduct, we bridge the gap between the learning rate and powered stochastic optimization (PSO) algorithms, which is still an open problem. Resorting to numerical experiments on numerous benchmark datasets, we test the parameter sensitivity of the proposed methods and demonstrate the superior performance of our new algorithms over state-of-the-art algorithms.
引用
收藏
页码:1598 / 1606
页数:9
相关论文
共 50 条
  • [11] Adaptive step size rules for stochastic optimization in large-scale learning
    Yang, Zhuang
    Ma, Li
    STATISTICS AND COMPUTING, 2023, 33 (02)
  • [12] Value function gradient learning for large-scale multistage stochastic programming problems
    Lee, Jinkyu
    Bae, Sanghyeon
    Kim, Woo Chang
    Lee, Yongjae
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 308 (01) : 321 - 335
  • [13] Adaptive Alternating Stochastic Gradient Descent Algorithms for Large-Scale Latent Factor Analysis
    Qin, Wen
    Luo, Xin
    Zhou, MengChu
    2021 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2021), 2021, : 285 - 290
  • [14] Optimizing multiple conjugate gradient solvers for large-scale systems
    Sancho, Jose Carlos
    Kerbyson, Darren J.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2009, 21 (14): : 1804 - 1818
  • [15] A subspace conjugate gradient algorithm for large-scale unconstrained optimization
    Yang, Yueting
    Chen, Yuting
    Lu, Yunlong
    NUMERICAL ALGORITHMS, 2017, 76 (03) : 813 - 828
  • [16] A family of conjugate gradient methods for large-scale nonlinear equations
    Feng, Dexiang
    Sun, Min
    Wang, Xueyong
    JOURNAL OF INEQUALITIES AND APPLICATIONS, 2017,
  • [17] A family of conjugate gradient methods for large-scale nonlinear equations
    Dexiang Feng
    Min Sun
    Xueyong Wang
    Journal of Inequalities and Applications, 2017
  • [18] A subspace conjugate gradient algorithm for large-scale unconstrained optimization
    Yueting Yang
    Yuting Chen
    Yunlong Lu
    Numerical Algorithms, 2017, 76 : 813 - 828
  • [19] CONJUGATE-GRADIENT METHODS FOR LARGE-SCALE MINIMIZATION IN METEOROLOGY
    NAVON, IM
    LEGLER, DM
    MONTHLY WEATHER REVIEW, 1987, 115 (08) : 1479 - 1502
  • [20] Distributing the Stochastic Gradient Sampler for Large-Scale LDA
    Yang, Yuan
    Chen, Jianfei
    Zhu, Jun
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1975 - 1984