A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC

被引:0
|
作者
Changyou CHEN [1 ]
Wenlin WANG [2 ]
Yizhe ZHANG [3 ]
Qinliang SU [4 ]
Lawrence CARIN [2 ]
机构
[1] Department of Computer Science and Engineering
[2] Department of Electrical and Computer Engineering, Duke University
[3] Microsoft Research
[4] School of Data and Computer Science, Sun Yat-sen University
关键词
Markov chain Monte Carlo; SG-MCMC; variance reduction; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic gradient Markov chain Monte Carlo(SG-MCMC) has been developed as a flexible family of scalable Bayesian sampling algorithms. However, there has been little theoretical analysis of the impact of minibatch size to the algorithm’s convergence rate. In this paper, we prove that at the beginning of an SG-MCMC algorithm, i.e., under limited computational budget/time, a larger minibatch size leads to a faster decrease of the mean squared error bound. The reason for this is due to the prominent noise in small minibatches when calculating stochastic gradients, motivating the necessity of variance reduction in SG-MCMC for practical use. By borrowing ideas from stochastic optimization, we propose a simple and practical variance-reduction technique for SG-MCMC, that is efficient in both computation and storage.More importantly, we develop the theory to prove that our algorithm induces a faster convergence rate than standard SG-MCMC. A number of large-scale experiments, ranging from Bayesian learning of logistic regression to deep neural networks, validate the theory and demonstrate the superiority of the proposed variance-reduction SG-MCMC framework.
引用
收藏
页码:67 / 79
页数:13
相关论文
共 50 条
  • [31] A mini-batch stochastic conjugate gradient algorithm with variance reduction
    Kou, Caixia
    Yang, Han
    JOURNAL OF GLOBAL OPTIMIZATION, 2023, 87 (2-4) : 1009 - 1025
  • [32] Stochastic distributed learning with gradient quantization and double-variance reduction
    Horvath, Samuel
    Kovalev, Dmitry
    Mishchenko, Konstantin
    Richtarik, Peter
    Stich, Sebastian
    OPTIMIZATION METHODS & SOFTWARE, 2023, 38 (01): : 91 - 106
  • [33] A mini-batch stochastic conjugate gradient algorithm with variance reduction
    Caixia Kou
    Han Yang
    Journal of Global Optimization, 2023, 87 : 1009 - 1025
  • [34] Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction
    Zou, Difan
    Xu, Pan
    Gu, Quanquan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [35] Stochastic gradient Hamiltonian Monte Carlo with variance reduction for Bayesian inference
    Li, Zhize
    Zhang, Tianyi
    Cheng, Shuyu
    Zhu, Jun
    Li, Jian
    MACHINE LEARNING, 2019, 108 (8-9) : 1701 - 1727
  • [36] Stochastic gradient Hamiltonian Monte Carlo with variance reduction for Bayesian inference
    Zhize Li
    Tianyi Zhang
    Shuyu Cheng
    Jun Zhu
    Jian Li
    Machine Learning, 2019, 108 : 1701 - 1727
  • [37] Convergence Analysis of a Class of Nonsmooth Gradient Systems
    Lu, Wenlian
    Wang, Jun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2008, 55 (11) : 3514 - 3527
  • [38] Convergence analysis of distributed stochastic gradient descent with shuffling
    Meng, Qi
    Chen, Wei
    Wang, Yue
    Ma, Zhi-Ming
    Liu, Tie-Yan
    NEUROCOMPUTING, 2019, 337 : 46 - 57
  • [39] Convergence analysis of asynchronous stochastic recursive gradient algorithms
    Wang, Pengfei
    Zheng, Nenggan
    KNOWLEDGE-BASED SYSTEMS, 2022, 252
  • [40] An analysis of stochastic variance reduced gradient for linear inverse problems *
    Jin, Bangti
    Zhou, Zehui
    Zou, Jun
    INVERSE PROBLEMS, 2022, 38 (02)