Optimal distributed subsampling under heterogeneity

被引:0
|
作者
Shao, Yujing [1 ,2 ]
Wang, Lei [1 ,2 ]
Lian, Heng [3 ]
机构
[1] Nankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
[2] Nankai Univ, LPMC, Tianjin, Peoples R China
[3] City Univ Hong Kong, Dept Math, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
ADMM; Heterogeneity; Nonsmooth loss; Random perturbation; Site-specific nuisance parameters; REGRESSION;
D O I
10.1007/s11222-024-10558-7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Distributed subsampling approaches have been proposed to process massive data in a distributed computing environment, where subsamples are taken from each site and then analyzed collectively to address statistical problems when the full data is not available. In this paper, we consider that each site involves a common parameter and site-specific nuisance parameters and then formulate a unified framework of optimal distributed subsampling under heterogeneity for general optimization problems with convex loss functions that could be nonsmooth. By establishing the consistency and asymptotic normality of the distributed subsample estimators for the common parameter of interest, we derive the optimal subsampling probabilities and allocation sizes under the A- and L-optimality criteria. A two-step algorithm is proposed for practical implementation and the asymptotic properties of the resultant estimator are established. For nonsmooth loss functions, an alternating direction method of multipliers method and a random perturbation procedure are proposed to obtain the subsample estimator and estimate the covariance matrices for statistical inference, respectively. The finite-sample performance of linear regression, logistic regression and quantile regression models is demonstrated through simulation studies and an application to the National Longitudinal Survey of Youth Dataset is also provided.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Fixing by Mixing: A Recipe for Optimal Byzantine ML under Heterogeneity
    Allouah, Youssef
    Gupta, Nirupam
    Farhadkhani, Sadegh
    Pinot, Rafael
    Guerraoui, Rachid
    Stephan, John
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [42] Optimal subsampling design for polynomial regression in one covariate
    Torsten Reuter
    Rainer Schwabe
    Statistical Papers, 2023, 64 : 1095 - 1117
  • [43] Approximating Partial Likelihood Estimators via Optimal Subsampling
    Zhang, Haixiang
    Zuo, Lulu
    Wang, HaiYing
    Sun, Liuquan
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2024, 33 (01) : 276 - 288
  • [44] Optimal subsampling for large-scale quantile regression
    Ai, Mingyao
    Wang, Fei
    Yu, Jun
    Zhang, Huiming
    JOURNAL OF COMPLEXITY, 2021, 62
  • [45] Optimal subsampling for composite quantile regression in big data
    Xiaohui Yuan
    Yong Li
    Xiaogang Dong
    Tianqing Liu
    Statistical Papers, 2022, 63 : 1649 - 1676
  • [46] Optimal subsampling for high-dimensional ridge regression
    Li, Hanyu
    Niu, Chengmei
    KNOWLEDGE-BASED SYSTEMS, 2024, 286
  • [47] Optimal subsampling for composite quantile regression in big data
    Yuan, Xiaohui
    Li, Yong
    Dong, Xiaogang
    Liu, Tianqing
    STATISTICAL PAPERS, 2022, 63 (05) : 1649 - 1676
  • [48] Optimal subsampling design for polynomial regression in one covariate
    Reuter, Torsten
    Schwabe, Rainer
    STATISTICAL PAPERS, 2023, 64 (04) : 1095 - 1117
  • [49] Optimal distributed subsampling for high-dimensional linear measurement error models via doubly bias-corrected score
    Gao, Junzhuo
    Wang, Lei
    ANALYSIS AND APPLICATIONS, 2025,
  • [50] Subsampling sparse graphons under minimal assumptions
    Lunde, Robert
    Sarkar, Purnamrita
    BIOMETRIKA, 2023, 110 (01) : 15 - 32