A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC

被引:0
|
作者
Changyou CHEN [1 ]
Wenlin WANG [2 ]
Yizhe ZHANG [3 ]
Qinliang SU [4 ]
Lawrence CARIN [2 ]
机构
[1] Department of Computer Science and Engineering
[2] Department of Electrical and Computer Engineering, Duke University
[3] Microsoft Research
[4] School of Data and Computer Science, Sun Yat-sen University
关键词
Markov chain Monte Carlo; SG-MCMC; variance reduction; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic gradient Markov chain Monte Carlo(SG-MCMC) has been developed as a flexible family of scalable Bayesian sampling algorithms. However, there has been little theoretical analysis of the impact of minibatch size to the algorithm’s convergence rate. In this paper, we prove that at the beginning of an SG-MCMC algorithm, i.e., under limited computational budget/time, a larger minibatch size leads to a faster decrease of the mean squared error bound. The reason for this is due to the prominent noise in small minibatches when calculating stochastic gradients, motivating the necessity of variance reduction in SG-MCMC for practical use. By borrowing ideas from stochastic optimization, we propose a simple and practical variance-reduction technique for SG-MCMC, that is efficient in both computation and storage.More importantly, we develop the theory to prove that our algorithm induces a faster convergence rate than standard SG-MCMC. A number of large-scale experiments, ranging from Bayesian learning of logistic regression to deep neural networks, validate the theory and demonstrate the superiority of the proposed variance-reduction SG-MCMC framework.
引用
收藏
页码:67 / 79
页数:13
相关论文
共 50 条
  • [41] General inertial proximal stochastic variance reduction gradient for nonconvex nonsmooth optimization
    Sun, Shuya
    He, Lulu
    JOURNAL OF INEQUALITIES AND APPLICATIONS, 2023, 2023 (01)
  • [42] Stochastic quasi-gradient methods: variance reduction via Jacobian sketching
    Robert M. Gower
    Peter Richtárik
    Francis Bach
    Mathematical Programming, 2021, 188 : 135 - 192
  • [43] Stochastic quasi-gradient methods: variance reduction via Jacobian sketching
    Gower, Robert M.
    Richtarik, Peter
    Bach, Francis
    MATHEMATICAL PROGRAMMING, 2021, 188 (01) : 135 - 192
  • [44] A Line Search Based Proximal Stochastic Gradient Algorithm with Dynamical Variance Reduction
    Giorgia Franchini
    Federica Porta
    Valeria Ruggiero
    Ilaria Trombini
    Journal of Scientific Computing, 2023, 94
  • [45] A Line Search Based Proximal Stochastic Gradient Algorithm with Dynamical Variance Reduction
    Franchini, Giorgia
    Porta, Federica
    Ruggiero, Valeria
    Trombini, Ilaria
    JOURNAL OF SCIENTIFIC COMPUTING, 2023, 94 (01)
  • [46] General inertial proximal stochastic variance reduction gradient for nonconvex nonsmooth optimization
    Shuya Sun
    Lulu He
    Journal of Inequalities and Applications, 2023
  • [47] Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems
    Yang, Junchi
    Kiyavash, Negar
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [48] Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance Reduction
    Feng, Jie
    Wei, Ke
    Chen, Jinchi
    JOURNAL OF SCIENTIFIC COMPUTING, 2024, 101 (02)
  • [49] Continuous-Time Stochastic Mirror Descent on a Network: Variance Reduction, Consensus, Convergence
    Raginsky, Maxim
    Bouvrie, Jake
    2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 6793 - 6800
  • [50] N-SVRG: Stochastic Variance Reduction Gradient with Noise Reduction Ability for Small Batch Samples
    Pan, Haijie
    Zheng, Lirong
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2022, 131 (01): : 493 - 512