A distributed multiple sample testing for massive data

被引:3
|
作者
Xie Xiaoyue [1 ,2 ]
Shi Jian [1 ,2 ]
Song Kai [3 ]
机构
[1] Chinese Acad Sci, Acad Math & Syst Sci, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Math Sci, Beijing, Peoples R China
[3] Beijing Inst Technol, Sch Management & Econ, Beijing, Peoples R China
关键词
Distributed scheme; hypothesis testing; fraud detection; classification;
D O I
10.1080/02664763.2021.1911967
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
When the data are stored in a distributed manner, direct application of traditional hypothesis testing procedures is often prohibitive due to communication costs and privacy concerns. This paper mainly develops and investigates a distributed two-node Kolmogorov-Smirnov hypothesis testing scheme, implemented by the divide-and-conquer strategy. In addition, this paper also provides a distributed fraud detection and a distribution-based classification for multi-node machines based on the proposed hypothesis testing scheme. The distributed fraud detection is to detect which node stores fraud data in multi-node machines and the distribution-based classification is to determine whether the multi-node distributions differ and classify different distributions. These methods can improve the accuracy of statistical inference in a distributed storage architecture. Furthermore, this paper verifies the feasibility of the proposed methods by simulation and real example studies.
引用
收藏
页码:555 / 573
页数:19
相关论文
共 50 条
  • [11] Distributed Penalized Modal Regression for Massive Data
    Jin Jun
    Liu Shuangzhe
    Ma Tiefeng
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2023, 36 (02) : 798 - 821
  • [12] Distributed eigenfaces for massive face image data
    Park, Jeong-Keun
    Park, Ho-Hyun
    Park, Jaehwa
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (24) : 25983 - 26000
  • [13] A Data Accessing Method in Distributed Massive Computing
    Zeng Dadan
    Zhou Minqi
    Zhou Aoying
    HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 2, PROCEEDINGS, 2009, : 437 - 440
  • [14] Distributed Penalized Modal Regression for Massive Data
    Jun Jin
    Shuangzhe Liu
    Tiefeng Ma
    Journal of Systems Science and Complexity, 2023, 36 : 798 - 821
  • [15] Robust distributed modal regression for massive data
    Wang, Kangning
    Li, Shaomin
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 160
  • [16] Distributed Bayesian Inference in Massive Spatial Data
    Guhaniyogi, Rajarshi
    Li, Cheng
    Savitsky, Terrance
    Srivastava, Sanvesh
    STATISTICAL SCIENCE, 2023, 38 (02) : 262 - 284
  • [17] Distributed Penalized Modal Regression for Massive Data
    JIN Jun
    LIU Shuangzhe
    MA Tiefeng
    Journal of Systems Science & Complexity, 2023, 36 (02) : 798 - 821
  • [18] Distributed eigenfaces for massive face image data
    Jeong-Keun Park
    Ho-Hyun Park
    Jaehwa Park
    Multimedia Tools and Applications, 2017, 76 : 25983 - 26000
  • [19] Distributed multiple hypotheses testing with serial distributed decision fusion
    Wang, XG
    Moallem, M
    Patel, RV
    2001 IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION: INTEGRATING INTELLIGENT MACHINES WITH HUMANS FOR A BETTER TOMORROW, 2001, : 549 - 554
  • [20] A scalable nonparametric specification testing for massive data
    Zhao, Yanyan
    Zou, Changliang
    Wang, Zhaojun
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2019, 200 : 161 - 175