Least Squares Model Averaging for Distributed Data

被引:0
|
作者
Zhang, Haili [1 ]
Liu, Zhaobo [2 ]
Zou, Guohua [3 ]
机构
[1] Shenzhen Polytech Univ, Inst Appl Math, Shenzhen 518055, Peoples R China
[2] Shenzhen Univ, Inst Adv Study, Shenzhen 518060, Peoples R China
[3] Capital Normal Univ, Sch Math Sci, Beijing 100048, Peoples R China
基金
中国国家自然科学基金;
关键词
consistency; distributed data; divide and conquer algorithm; Mallows' criterion; model averaging; optimality; FOCUSED INFORMATION CRITERION; BIG DATA; REGRESSION; SELECTION; INFERENCE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Divide and conquer algorithm is a common strategy applied in big data. Model averaging has the natural divide-and-conquer feature, but its theory has not been developed in big data scenarios. The goal of this paper is to fill this gap. We propose two divide-and conquer-type model averaging estimators for linear models with distributed data. Under some regularity conditions, we show that the weights from Mallows model averaging criterion converge in L-2 to the theoretically optimal weights minimizing the risk of the model averaging estimator. We also give the bounds of the in-sample and out-of-sample mean squared errors and prove the asymptotic optimality for the proposed model averaging estimators. Our conclusions hold even when the dimensions and the number of candidate models are divergent. Simulation results and a real airline data analysis illustrate that the proposed model averaging methods perform better than the commonly used model selection and model averaging methods in distributed data cases. Our approaches contribute to model averaging theory in distributed data and parallel computations, and can be applied in big data analysis to save time and reduce the computational burden.
引用
收藏
页数:59
相关论文
共 50 条
  • [31] A Distributed Algorithm for Least Squares Solutions
    Wang, Xuan
    Zhou, Jingqiu
    Mou, Shaoshuai
    Corless, Martin J.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (10) : 4217 - 4222
  • [32] Bayesian model averaging and weighted-average least squares: Equivariance, stability, and numerical issues
    De Luca, Giuseppe
    Magnus, Jan R.
    STATA JOURNAL, 2011, 11 (04): : 518 - 544
  • [33] LEAST-SQUARES FITTING OF A MODEL TO SOME OPTICAL DATA
    LESLIE, RT
    SHAW, DE
    COULMAN, CE
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA, 1966, 56 (09) : 1261 - &
  • [34] Constructing Partial Least Squares Model in the Presence of Missing Data
    Jamil, Jastini Mohd
    Shaharanee, Izwan Nizal Mohd
    ADVANCED SCIENCE LETTERS, 2015, 21 (06) : 1704 - 1707
  • [35] Least squares data fitting
    Ripa, P
    CIENCIAS MARINAS, 2002, 28 (01) : 79 - 105
  • [36] Least-Squares Estimator for Polluted Stream Variables in a Distributed Parameter Model
    Koivo, H. N.
    Koivo, A. J.
    ADVANCES IN WATER RESOURCES, 1978, 1 (04) : 191 - 194
  • [37] Distributed iteratively reweighted least squares and applications
    Chen, Colin
    STATISTICS AND ITS INTERFACE, 2013, 6 (04) : 585 - 593
  • [38] Error analysis of distributed least squares ranking
    Chen, Hong
    Li, Han
    Pan, Zhibin
    NEUROCOMPUTING, 2019, 361 : 222 - 228
  • [39] Monitoring Least Squares Models of Distributed Streams
    Gabel, Moshe
    Keren, Daniel
    Schuster, Assaf
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 319 - 328
  • [40] LEAST-SQUARES SIMULATION OF DISTRIBUTED SYSTEMS
    ORNER, PA
    SALAMON, PF
    YU, W
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1975, AC20 (01) : 75 - 83