Distributed Bayesian Inference in Massive Spatial Data

被引:3
|
作者
Guhaniyogi, Rajarshi [1 ]
Li, Cheng [2 ]
Savitsky, Terrance [3 ]
Srivastava, Sanvesh [4 ]
机构
[1] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
[2] Natl Univ Singapore, Dept Stat & Data Sci, Singapore, Singapore
[3] US Bur Lab Stat, Washington, DC 20212 USA
[4] Univ Iowa, Dept Stat & Actuarial Sci, Iowa City, IA 52240 USA
基金
美国国家科学基金会;
关键词
Distributed Bayesian inference; Gaussian process; low-rank Gaussian process; massive spatial data; Wasserstein barycenter; GAUSSIAN PROCESS MODELS; DIVIDE-AND-CONQUER; APPROXIMATION; RATES; LIKELIHOODS; PREDICTION; REGRESSION; CLUSTERS; FIELDS;
D O I
10.1214/22-STS868
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Gaussian process (GP) regression is computationally expensive in spatial applications involving massive data. Various methods address this limitation, including a small number of Bayesian methods based on dis-tributed computations (or the divide-and-conquer strategy). Focusing on the latter literature, we achieve three main goals. First, we develop an extensible Bayesian framework for distributed spatial GP regression that embeds many popular methods. The proposed framework has three steps that partition the entire data into many subsets, apply a readily available Bayesian spatial pro-cess model in parallel on all the subsets, and combine the posterior distri-butions estimated on all the subsets into a pseudo posterior distribution that conditions on the entire data. The combined pseudo posterior distribution replaces the full data posterior distribution in prediction and inference prob-lems. Demonstrating our framework's generality, we extend posterior com-putations for (nondistributed) spatial process models with a stationary full -rank and a nonstationary low-rank GP priors to the distributed setting. Sec-ond, we contrast the empirical performance of popular distributed approaches with some widely-used, nondistributed alternatives and highlight their rela-tive advantages and shortcomings. Third, we provide theoretical support for our numerical observations and show that the Bayes L2-risks of the combined posterior distributions obtained from a subclass of the divide-and-conquer methods achieves the near-optimal convergence rate in estimating the true spatial surface with various types of covariance functions. Additionally, we provide upper bounds on the number of subsets to achieve these near-optimal rates.
引用
收藏
页码:262 / 284
页数:23
相关论文
共 50 条
  • [31] Approximate Bayesian inference for spatial econometrics models
    Bivand, Roger S.
    Gomez-Rubio, Virgilio
    Rue, Havard
    SPATIAL STATISTICS, 2014, 9 : 146 - 165
  • [32] Statistical inference in massive data sets
    Li, Runze
    Lin, Dennis K. J.
    Li, Bing
    APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2013, 29 (05) : 399 - 409
  • [33] Bayesian Inference for Multivariate Spatial Models with INLA
    Palmi-Perales, Francisco
    Gomez-Rubio, Virgilio
    Bivand, Roger S.
    Cameletti, Michela
    Rue, Havard
    R JOURNAL, 2023, 15 (03): : 172 - 190
  • [34] Bayesian Inference for the Spatial Random Effects Model
    Kang, Emily L.
    Cressie, Noel
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (495) : 972 - 983
  • [35] Bayesian Inference on a Mixture Model With Spatial Dependence
    Cucala, Lionel
    Marin, Jean-Michel
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2013, 22 (03) : 584 - 597
  • [36] Inference in distributed data clustering
    da Silva, Josenildo Costa
    Klusch, Matthias
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2006, 19 (04) : 363 - 369
  • [37] Inference on distributed data clustering
    da Silva, JC
    Klusch, M
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2005, 3587 : 610 - 619
  • [38] VESSEL CLASSIFICATION FEATURES USING SPATIAL BAYESIAN INFERENCE FROM HISTORICAL AIS DATA
    Meyer, R. G. V.
    Schwegmann, C. P.
    Kleynhans, W.
    2017 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2017, : 2629 - 2632
  • [39] Bayesian inference for big spatial data using non-stationary spectral simulation
    Yang, Hou-Cheng
    Bradley, Jonathan R.
    SPATIAL STATISTICS, 2021, 43
  • [40] Bayesian inference for categorical data analysis
    Agresti A.
    Hitchcock D.B.
    Statistical Methods and Applications, 2005, 14 (3): : 297 - 330