Least Squares Model Averaging for Distributed Data

被引：0

作者：

Zhang, Haili ^{[1
]}

Liu, Zhaobo ^{[2
]}

Zou, Guohua ^{[3
]}

机构：

[1] Shenzhen Polytech Univ, Inst Appl Math, Shenzhen 518055, Peoples R China

[2] Shenzhen Univ, Inst Adv Study, Shenzhen 518060, Peoples R China

[3] Capital Normal Univ, Sch Math Sci, Beijing 100048, Peoples R China

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2023年 / 24卷

基金：

中国国家自然科学基金;

关键词：

consistency; distributed data; divide and conquer algorithm; Mallows' criterion; model averaging; optimality; FOCUSED INFORMATION CRITERION; BIG DATA; REGRESSION; SELECTION; INFERENCE;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Divide and conquer algorithm is a common strategy applied in big data. Model averaging has the natural divide-and-conquer feature, but its theory has not been developed in big data scenarios. The goal of this paper is to fill this gap. We propose two divide-and conquer-type model averaging estimators for linear models with distributed data. Under some regularity conditions, we show that the weights from Mallows model averaging criterion converge in L-2 to the theoretically optimal weights minimizing the risk of the model averaging estimator. We also give the bounds of the in-sample and out-of-sample mean squared errors and prove the asymptotic optimality for the proposed model averaging estimators. Our conclusions hold even when the dimensions and the number of candidate models are divergent. Simulation results and a real airline data analysis illustrate that the proposed model averaging methods perform better than the commonly used model selection and model averaging methods in distributed data cases. Our approaches contribute to model averaging theory in distributed data and parallel computations, and can be applied in big data analysis to save time and reduce the computational burden.

引用

页数：59

共 50 条

[41] Robust and scalable distributed recursive least squares☆
Azzollini, Ilario Antonio
Bin, Michelangelo
Marconi, Lorenzo
Parisini, Thomas
AUTOMATICA, 2023, 158
[42] LEAST SQUARES ALLOCATION MODEL
BRIEF, RP
OWEN, J
JOURNAL OF ACCOUNTING RESEARCH, 1968, 6 (02) : 193 - 199
[43] Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification
Jain, Prateek
Netrapalli, Praneeth
Kakade, Sham M.
Kidambi, Rahul
Sidford, Aaron
JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 18
[44] HOMOGENEOUS INTEGRAL AVERAGING OPERATORS OBTAINED BY METHOD OF LEAST-SQUARES
KATOVNIK, VY
AUTOMATION AND REMOTE CONTROL, 1971, 32 (11) : 1767 - &
[45] Weighted least squares methods for prediction in the functional data linear model
Delaigle, Aurore
Hall, Peter
Apanasovich, Tatiyana V.
ELECTRONIC JOURNAL OF STATISTICS, 2009, 3 : 865 - 885
[46] LEAST-SQUARES MODEL-FITTING TO FUZZY VECTOR DATA
CELMINS, A
FUZZY SETS AND SYSTEMS, 1987, 22 (03) : 245 - 269
[47] The Partial Least Squares Spline Model for Public Health Surveillance Data
Sadiq, Maryam
Alnagar, Dalia Kamal Fathi
Abdulrahman, Alanazi Talal
Alharbi, Randa
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2022, 2022
[48] A least squares method in back fitting the data base of a simulation model
Dall'Osso, A
ADVANCES IN ENGINEERING SOFTWARE, 2002, 33 (11-12) : 743 - 748
[49] Partial least squares for dependent data
Singer, Marco
Krivobokova, Tatyana
Munk, Axel
De Groot, Bert
BIOMETRIKA, 2016, 103 (02) : 351 - 362
[50] Partial Least Squares for Heterogeneous Data
Buhlmann, Peter
MULTIPLE FACETS OF PARTIAL LEAST SQUARES AND RELATED METHODS, 2016, 173 : 3 - 15

← 1 2 3 4 5 →