The Powerball Method With Biased Stochastic Gradient Estimation for Large-Scale Learning Systems

被引:1
|
作者
Yang, Zhuang [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
基金
中国国家自然科学基金;
关键词
Optimization; Convergence; Approximation algorithms; Stochastic processes; Learning systems; Support vector machines; Noise measurement; Biased gradient estimator; convergence rates; large-scale datasets; Powerball function; stochastic optimization (SO); DESCENT; CONVERGENCE;
D O I
10.1109/TCSS.2024.3411630
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The Powerball method, via incorporating a power coefficient into conventional optimization algorithms, has been considered in accelerating stochastic optimization (SO) algorithms in recent years, giving rise to a series of powered stochastic optimization (PSO) algorithms. Although the Powerball technique is orthogonal to the existing accelerated techniques (e.g., the learning rate adjustment strategy) for SO algorithms, the current PSO algorithms take a nearly similar algorithm framework to SO algorithms, where the direct negative result for PSO algorithms is making them inherit low-convergence rate and unstable performance from SO for practical problems. Inspired by this gap, this work develops a novel class of PSO algorithms from the perspective of biased stochastic gradient estimation (BSGE). Specifically, we first explore the theoretical property and the empirical characteristic of vanilla-powered stochastic gradient descent (P-SGD) with BSGE. Second, to further demonstrate the positive impact of BSGE in enhancing the P-SGD type algorithm, we investigate the feature of theory and experiment of P-SGD with momentum under BSGE, where we particularly focus on the effect of negative momentum in P-SGD that is less studied in PSO. Particularly, we prove that the overall complexity of the resulting algorithms matches that of advanced SO algorithms. Finally, large numbers of numerical experiments on benchmark datasets confirm the successful reformation of BSGE in perfecting PSO. This work provides comprehension of the role of BSGE in PSO algorithms, extending the family of PSO algorithms.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Adaptive Powerball Stochastic Conjugate Gradient for Large-Scale Learning
    Yang, Zhuang
    IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (06) : 1598 - 1606
  • [2] Large-Scale Machine Learning with Stochastic Gradient Descent
    Bottou, Leon
    COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
  • [3] SAAGs: Biased stochastic variance reduction methods for large-scale learning
    Vinod Kumar Chauhan
    Anuj Sharma
    Kalpana Dahiya
    Applied Intelligence, 2019, 49 : 3331 - 3361
  • [4] SAAGs: Biased stochastic variance reduction methods for large-scale learning
    Chauhan, Vinod Kumar
    Sharma, Anuj
    Dahiya, Kalpana
    APPLIED INTELLIGENCE, 2019, 49 (09) : 3331 - 3361
  • [5] Painless Stochastic Conjugate Gradient for Large-Scale Machine Learning
    Yang, Zhuang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14645 - 14658
  • [6] MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING
    Wiesler, Simon
    Richard, Alexander
    Schlueter, Ralf
    Ney, Hermann
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [7] Large-scale machine learning with fast and stable stochastic conjugate gradient
    Yang, Zhuang
    COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 173
  • [8] On Biased Stochastic Gradient Estimation
    Driggs, Derek
    Liang, Jingwei
    Schonlieb, Carola-Bibiane
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [9] Value function gradient learning for large-scale multistage stochastic programming problems
    Lee, Jinkyu
    Bae, Sanghyeon
    Kim, Woo Chang
    Lee, Yongjae
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 308 (01) : 321 - 335
  • [10] A Stochastic Gradient Method with Biased Estimation for Faster Nonconvex Optimization
    Bi, Jia
    Gunn, Steve R.
    PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2019, 11671 : 337 - 349