Optimal distributed subsampling under heterogeneity

被引:0
|
作者
Shao, Yujing [1 ,2 ]
Wang, Lei [1 ,2 ]
Lian, Heng [3 ]
机构
[1] Nankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
[2] Nankai Univ, LPMC, Tianjin, Peoples R China
[3] City Univ Hong Kong, Dept Math, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
ADMM; Heterogeneity; Nonsmooth loss; Random perturbation; Site-specific nuisance parameters; REGRESSION;
D O I
10.1007/s11222-024-10558-7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Distributed subsampling approaches have been proposed to process massive data in a distributed computing environment, where subsamples are taken from each site and then analyzed collectively to address statistical problems when the full data is not available. In this paper, we consider that each site involves a common parameter and site-specific nuisance parameters and then formulate a unified framework of optimal distributed subsampling under heterogeneity for general optimization problems with convex loss functions that could be nonsmooth. By establishing the consistency and asymptotic normality of the distributed subsample estimators for the common parameter of interest, we derive the optimal subsampling probabilities and allocation sizes under the A- and L-optimality criteria. A two-step algorithm is proposed for practical implementation and the asymptotic properties of the resultant estimator are established. For nonsmooth loss functions, an alternating direction method of multipliers method and a random perturbation procedure are proposed to obtain the subsample estimator and estimate the covariance matrices for statistical inference, respectively. The finite-sample performance of linear regression, logistic regression and quantile regression models is demonstrated through simulation studies and an application to the National Longitudinal Survey of Youth Dataset is also provided.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Optimal subsampling for modal regression in massive data
    Yue Chao
    Lei Huang
    Xuejun Ma
    Jiajun Sun
    Metrika, 2024, 87 : 379 - 409
  • [32] Optimal subsampling for linear quantile regression models
    Fan, Yan
    Liu, Yukun
    Zhu, Lixing
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2021, 49 (04): : 1039 - 1057
  • [33] Quantifying the distance to criticality under subsampling
    Jens Wilting
    Viola Priesemann
    BMC Neuroscience, 16 (Suppl 1)
  • [34] Finding Robust Itemsets under Subsampling
    Tatti, Nikolaj
    Moerchen, Fabian
    Calders, Toon
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2014, 39 (03):
  • [35] ALGORITHMIC SUBSAMPLING UNDER MULTIWAY CLUSTERING
    Chiang, Harold D.
    Li, Jiatong
    Sasaki, Yuya
    ECONOMETRIC THEORY, 2023,
  • [36] Optimal Decorrelated Score Subsampling for High-Dimensional Generalized Linear Models Under Measurement Constraints
    Shao, Yujing
    Wang, Lei
    Lian, Heng
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2024,
  • [37] International Environmental Agreements: Design of Optimal Transfers Under Heterogeneity
    Calcott, Paul
    Petkov, Vladimir P.
    ENVIRONMENTAL MODELING & ASSESSMENT, 2012, 17 (03) : 209 - 220
  • [38] Optimal fiscal policy under monopolistic competition with firm heterogeneity
    Chang, Cheng-wei
    SCOTTISH JOURNAL OF POLITICAL ECONOMY, 2023, 70 (05) : 423 - 438
  • [39] International Environmental Agreements: Design of Optimal Transfers Under Heterogeneity
    Paul Calcott
    Vladimir P. Petkov
    Environmental Modeling & Assessment, 2012, 17 : 209 - 220
  • [40] Optimal Network Membership Estimation under Severe Degree Heterogeneity
    Ke, Zheng Tracy
    Wang, Jingming
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024,