A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC

被引:0
|
作者
Changyou CHEN [1 ]
Wenlin WANG [2 ]
Yizhe ZHANG [3 ]
Qinliang SU [4 ]
Lawrence CARIN [2 ]
机构
[1] Department of Computer Science and Engineering
[2] Department of Electrical and Computer Engineering, Duke University
[3] Microsoft Research
[4] School of Data and Computer Science, Sun Yat-sen University
关键词
Markov chain Monte Carlo; SG-MCMC; variance reduction; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic gradient Markov chain Monte Carlo(SG-MCMC) has been developed as a flexible family of scalable Bayesian sampling algorithms. However, there has been little theoretical analysis of the impact of minibatch size to the algorithm’s convergence rate. In this paper, we prove that at the beginning of an SG-MCMC algorithm, i.e., under limited computational budget/time, a larger minibatch size leads to a faster decrease of the mean squared error bound. The reason for this is due to the prominent noise in small minibatches when calculating stochastic gradients, motivating the necessity of variance reduction in SG-MCMC for practical use. By borrowing ideas from stochastic optimization, we propose a simple and practical variance-reduction technique for SG-MCMC, that is efficient in both computation and storage.More importantly, we develop the theory to prove that our algorithm induces a faster convergence rate than standard SG-MCMC. A number of large-scale experiments, ranging from Bayesian learning of logistic regression to deep neural networks, validate the theory and demonstrate the superiority of the proposed variance-reduction SG-MCMC framework.
引用
收藏
页码:67 / 79
页数:13
相关论文
共 50 条
  • [21] Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis
    Kasai, Hiroyuki
    Sato, Hiroyuki
    Mishra, Bamdev
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [22] Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
    Wang, Hongjian
    Gurbuzbalaban, Mert
    Zhu, Lingjiong
    Simsekli, Umut
    Erdogdu, Murat A.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [23] Convergence of Mean-field Langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction
    Suzuki, Taiji
    Wu, Denny
    Nitanda, Atsushi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [24] On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants
    Reddi, Sashank J.
    Hefny, Ahmed
    Sra, Suvrit
    Poczos, Barnabas
    Smola, Alex
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [25] Nonconvex optimization with inertial proximal stochastic variance reduction gradient
    He, Lulu
    Ye, Jimin
    Jianwei, E.
    INFORMATION SCIENCES, 2023, 648
  • [26] Convergence analysis of gradient descent stochastic algorithms
    Shapiro, A
    Wardi, Y
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1996, 91 (02) : 439 - 454
  • [27] On stochastic mirror descent with interacting particles: Convergence properties and variance reduction
    Borovykh, A.
    Kantas, N.
    Parpas, P.
    Pavliotis, G. A.
    PHYSICA D-NONLINEAR PHENOMENA, 2021, 418
  • [28] Accelerated Stochastic Variance Reduction for a Class of Convex Optimization Problems
    He, Lulu
    Ye, Jimin
    Jianwei, E.
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2023, 196 (03) : 810 - 828
  • [29] Accelerated Stochastic Variance Reduction for a Class of Convex Optimization Problems
    Lulu He
    Jimin Ye
    E. Jianwei
    Journal of Optimization Theory and Applications, 2023, 196 : 810 - 828
  • [30] Accelerated Stochastic Variance Reduction Gradient Algorithms for Robust Subspace Clustering
    Liu, Hongying
    Yang, Linlin
    Zhang, Longge
    Shang, Fanhua
    Liu, Yuanyuan
    Wang, Lijun
    SENSORS, 2024, 24 (11)