Distributed Bayesian posterior voting strategy for massive data

被引:1
|
作者
Li, Xuerui [1 ]
Kang, Lican [2 ]
Liu, Yanyan [1 ]
Wu, Yuanshan [3 ]
机构
[1] Wuhan Univ, Sch Math & Stat, Wuhan, Peoples R China
[2] NUS Med Sch, Ctr Quantitat Med Duke, Singapore, Singapore
[3] Zhongnan Univ Econ, Sch Stat & Math, Wuhan, Peoples R China
来源
ELECTRONIC RESEARCH ARCHIVE | 2022年 / 30卷 / 05期
关键词
Hierarchical Bayes formulation; massive data; majority-voting; split-and-conquer; Shrinkage prior; VARIABLE SELECTION; REGRESSION;
D O I
10.3934/era.2022098
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The emergence of massive data has driven recent interest in developing statistical learning and large-scale algorithms for analysis on distributed platforms. One of the widely used statistical approaches is split-and-conquer (SaC), which was originally performed by aggregating all local solutions through a simple average to reduce the computational burden caused by communication costs. Aiming at lower computation cost and satisfactorily acceptable accuracy, this paper extends SaC to Bayesian variable selection for ultra-high dimensional linear regression and builds BVSaC for aggregation. Suppose ultrahigh-dimensional data are stored in a distributed manner across multiple computing nodes, with each computing resource containing a disjoint subset of data. On each node machine, we perform variable selection and coefficient estimation through a hierarchical Bayes formulation. Then, a weighted majority voting method BVSaC is used to combine the local results to retain good performance. The proposed approach only requires a small portion of computation cost on each local dataset and therefore eases the computational burden, especially in Bayesian computation, meanwhile, pays a little cost to receive accuracy, which in turn increases the feasibility of analyzing extraordinarily large datasets. Simulations and a real-world example show that the proposed approach performed as well as the whole sample hierarchical Bayes method in terms of the accuracy of variable selection and estimation.
引用
收藏
页码:1936 / 1953
页数:18
相关论文
共 50 条
  • [21] Distributed eigenfaces for massive face image data
    Park, Jeong-Keun
    Park, Ho-Hyun
    Park, Jaehwa
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (24) : 25983 - 26000
  • [22] A Data Accessing Method in Distributed Massive Computing
    Zeng Dadan
    Zhou Minqi
    Zhou Aoying
    HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 2, PROCEEDINGS, 2009, : 437 - 440
  • [23] Distributed Penalized Modal Regression for Massive Data
    Jun Jin
    Shuangzhe Liu
    Tiefeng Ma
    Journal of Systems Science and Complexity, 2023, 36 : 798 - 821
  • [24] Robust distributed modal regression for massive data
    Wang, Kangning
    Li, Shaomin
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 160
  • [25] Distributed Penalized Modal Regression for Massive Data
    JIN Jun
    LIU Shuangzhe
    MA Tiefeng
    Journal of Systems Science & Complexity, 2023, 36 (02) : 798 - 821
  • [26] Distributed eigenfaces for massive face image data
    Jeong-Keun Park
    Ho-Hyun Park
    Jaehwa Park
    Multimedia Tools and Applications, 2017, 76 : 25983 - 26000
  • [27] An Improved Naive Bayesian Classification Algorithm for Massive Data
    Sun Tongjing
    Li Ji
    Ning Ke
    2018 IEEE 4TH INTERNATIONAL CONFERENCE ON CONTROL SCIENCE AND SYSTEMS ENGINEERING (ICCSSE 2018), 2018, : 489 - 494
  • [28] Nonparametric Bayesian Extraction of Object Configurations in Massive Data
    Meillier, Celine
    Chatelain, Florent
    Michel, Olivier
    Ayasso, Hacheme
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2015, 63 (08) : 1911 - 1924
  • [29] Differentially private Bayesian learning on distributed data
    Heikkila, Mikko
    Lagerspetz, Eemil
    Kaski, Samuel
    Shimizu, Kana
    Tarkoma, Sasu
    Honkela, Antti
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [30] Distributed Bayesian Detection in the Presence of Byzantine Data
    Kailkhura, Bhavya
    Han, Yunghsiang S.
    Brahma, Swastik
    Varshney, Pramod K.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2015, 63 (19) : 5250 - 5263