Distributed Bayesian Inference in Massive Spatial Data

被引:3
|
作者
Guhaniyogi, Rajarshi [1 ]
Li, Cheng [2 ]
Savitsky, Terrance [3 ]
Srivastava, Sanvesh [4 ]
机构
[1] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
[2] Natl Univ Singapore, Dept Stat & Data Sci, Singapore, Singapore
[3] US Bur Lab Stat, Washington, DC 20212 USA
[4] Univ Iowa, Dept Stat & Actuarial Sci, Iowa City, IA 52240 USA
基金
美国国家科学基金会;
关键词
Distributed Bayesian inference; Gaussian process; low-rank Gaussian process; massive spatial data; Wasserstein barycenter; GAUSSIAN PROCESS MODELS; DIVIDE-AND-CONQUER; APPROXIMATION; RATES; LIKELIHOODS; PREDICTION; REGRESSION; CLUSTERS; FIELDS;
D O I
10.1214/22-STS868
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Gaussian process (GP) regression is computationally expensive in spatial applications involving massive data. Various methods address this limitation, including a small number of Bayesian methods based on dis-tributed computations (or the divide-and-conquer strategy). Focusing on the latter literature, we achieve three main goals. First, we develop an extensible Bayesian framework for distributed spatial GP regression that embeds many popular methods. The proposed framework has three steps that partition the entire data into many subsets, apply a readily available Bayesian spatial pro-cess model in parallel on all the subsets, and combine the posterior distri-butions estimated on all the subsets into a pseudo posterior distribution that conditions on the entire data. The combined pseudo posterior distribution replaces the full data posterior distribution in prediction and inference prob-lems. Demonstrating our framework's generality, we extend posterior com-putations for (nondistributed) spatial process models with a stationary full -rank and a nonstationary low-rank GP priors to the distributed setting. Sec-ond, we contrast the empirical performance of popular distributed approaches with some widely-used, nondistributed alternatives and highlight their rela-tive advantages and shortcomings. Third, we provide theoretical support for our numerical observations and show that the Bayes L2-risks of the combined posterior distributions obtained from a subclass of the divide-and-conquer methods achieves the near-optimal convergence rate in estimating the true spatial surface with various types of covariance functions. Additionally, we provide upper bounds on the number of subsets to achieve these near-optimal rates.
引用
收藏
页码:262 / 284
页数:23
相关论文
共 50 条
  • [41] Bayesian inference analysis of ellipsometry data
    Barradas, NP
    Keddie, JL
    Sackin, R
    PHYSICAL REVIEW E, 1999, 59 (05): : 6138 - 6151
  • [42] Objective Bayesian Inference for Bilateral Data
    M'lan, Cyr Emile
    Chen, Ming-Hui
    BAYESIAN ANALYSIS, 2015, 10 (01): : 139 - 170
  • [43] Condition monitoring of distributed systems using two-stage Bayesian inference data fusion
    Jaramillo, Victor H.
    Ottewill, James R.
    Dudek, Rafel
    Lepiarczyk, Dariusz
    Pawlik, Pawel
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2017, 87 : 91 - 110
  • [44] Efficient Design and Inference in Distributed Bayesian Networks: An Overview
    de Oude, Patrick
    Groen, Frans C. A.
    Pavlin, Gregor
    LOGIC, LANGUAGE, AND COMPUTATION, 2011, 6618 : 125 - 144
  • [45] Decentralized Approximate Bayesian Inference for Distributed Sensor Network
    Gholami, Behnam
    Yoon, Sejong
    Pavlovic, Vladimir
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1582 - 1588
  • [46] Backward inference in Bayesian networks for distributed systems management
    Ding J.
    Krämer B.
    Bai Y.
    Chen H.
    Journal of Network and Systems Management, 2005, 13 (4) : 409 - 427
  • [47] Sparse Bayesian Inference based Direct Localization for Massive MIMO
    Liu, Guanying
    Liu, An
    Lian, Lixiang
    Lau, Vincent
    Zhao, Min-Jian
    2019 IEEE 90TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2019-FALL), 2019,
  • [48] Nonparametric Bayesian Aggregation for Massive Data
    Shang, Zuofeng
    Hao, Botao
    Cheng, Guang
    JOURNAL OF MACHINE LEARNING RESEARCH, 2019, 20
  • [49] Nonparametric Bayesian aggregation for massive data
    Shang, Zuofeng
    Hao, Botao
    Cheng, Guang
    Journal of Machine Learning Research, 2019, 20
  • [50] Task scheduling of massive spatial data processing across distributed data centers: what's new?
    Song, Weijing
    Yue, Shasha
    Wang, Lizhe
    Zhang, Wanfeng
    Liu, Dingsheng
    2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 976 - 981