A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC

被引:0
|
作者
Changyou CHEN [1 ]
Wenlin WANG [2 ]
Yizhe ZHANG [3 ]
Qinliang SU [4 ]
Lawrence CARIN [2 ]
机构
[1] Department of Computer Science and Engineering
[2] Department of Electrical and Computer Engineering, Duke University
[3] Microsoft Research
[4] School of Data and Computer Science, Sun Yat-sen University
关键词
Markov chain Monte Carlo; SG-MCMC; variance reduction; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic gradient Markov chain Monte Carlo(SG-MCMC) has been developed as a flexible family of scalable Bayesian sampling algorithms. However, there has been little theoretical analysis of the impact of minibatch size to the algorithm’s convergence rate. In this paper, we prove that at the beginning of an SG-MCMC algorithm, i.e., under limited computational budget/time, a larger minibatch size leads to a faster decrease of the mean squared error bound. The reason for this is due to the prominent noise in small minibatches when calculating stochastic gradients, motivating the necessity of variance reduction in SG-MCMC for practical use. By borrowing ideas from stochastic optimization, we propose a simple and practical variance-reduction technique for SG-MCMC, that is efficient in both computation and storage.More importantly, we develop the theory to prove that our algorithm induces a faster convergence rate than standard SG-MCMC. A number of large-scale experiments, ranging from Bayesian learning of logistic regression to deep neural networks, validate the theory and demonstrate the superiority of the proposed variance-reduction SG-MCMC framework.
引用
收藏
页码:67 / 79
页数:13
相关论文
共 50 条
  • [1] A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC
    Chen, Changyou
    Wang, Wenlin
    Zhang, Yizhe
    Su, Qinliang
    Carin, Lawrence
    SCIENCE CHINA-INFORMATION SCIENCES, 2019, 62 (01)
  • [2] A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC
    Changyou Chen
    Wenlin Wang
    Yizhe Zhang
    Qinliang Su
    Lawrence Carin
    Science China Information Sciences, 2019, 62
  • [3] Decentralized Stochastic Optimization and Machine Learning: A Unified Variance-Reduction Framework for Robust Performance and Fast Convergence
    Xin, Ran
    Kar, Soummya
    Khan, Usman A.
    IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (03) : 102 - 113
  • [4] New class of variance-reduction techniques using lattice symmetries
    Blum, Thomas
    Izubuchi, Taku
    Shintani, Eigo
    PHYSICAL REVIEW D, 2013, 88 (09):
  • [5] Reduction for Dependent Sequences with Applications to Stochastic Gradient MCMC
    Belomestny, Denis
    Iosipoi, Leonid
    Moulines, Eric
    Naumov, Alexey
    Samsonov, Sergey
    SIAM-ASA JOURNAL ON UNCERTAINTY QUANTIFICATION, 2021, 9 (02): : 507 - 535
  • [6] An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient
    Xu, Pan
    Gao, Felicia
    Gu, Quanquan
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 541 - 551
  • [7] Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints
    Ablin, Pierre
    Vary, Simon
    Gao, Bin
    Absil, P. -a.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 38
  • [8] On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators
    Chen, Changyou
    Ding, Nan
    Carin, Lawrence
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [9] Stochastic Gradient Langevin Dynamics with Variance Reduction
    Huang, Zhishen
    Becker, Stephen
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [10] Stochastic gradient descent with variance reduction technique
    Zhang, Jinjing
    Hu, Fei
    Xu, Xiaofei
    Li, Li
    WEB INTELLIGENCE, 2018, 16 (03) : 187 - 194