A Study of High-Dimensional Data Imputation Using Additive LASSO Regression Model

被引:4
|
作者
Lavanya, K. [1 ]
Reddy, L. S. S. [2 ]
Reddy, B. Eswara [3 ]
机构
[1] JNTUA, Dept Comp Sci & Engn, Anantapur 515822, Andhra Pradesh, India
[2] KLU, Dept Comp Sci & Engn, Guntur 522502, Andhra Pradesh, India
[3] JNTUA, Dept Comp Sci, Anantapur 517234, Andhra Pradesh, India
关键词
High-dimensional data; Multiple imputations; Regression; Missing data; MULTIPLE IMPUTATION; MISSING-DATA; METAANALYSIS; HETEROGENEITY;
D O I
10.1007/978-981-10-8055-5_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid growth of computational domains, bioinformatics finance, engineering, biometrics, and neuroimaging emphasize the necessity for analyzing high-dimensional data. Many real-world datasets may contain hundreds or thousands of features. The common problem in most of the knowledge-based classification problems is quality and quantity of data. In general, the common problem with many high-dimensional data samples is that it contains missing or unknown attribute values, incomplete feature vectors, and uncertain or vague data which have to be handled carefully. Due to the presence of a large segment of missing values in the datasets, refined multiple imputation methods are required to estimate the missing values so that a fair and more consistent analysis can be achieved. In this paper, three imputation (MI) methods, mean, imputations predictive mean, and imputations by additive LASSO, are employed in cloud. Results show that imputations by additive LASSO are the preferred multiple imputation (MI) method.
引用
收藏
页码:19 / 30
页数:12
相关论文
共 50 条
  • [21] Multiple imputation in the presence of high-dimensional data
    Zhao, Yize
    Long, Qi
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2016, 25 (05) : 2021 - 2035
  • [22] Multiple imputation with compatibility for high-dimensional data
    Zahid, Faisal Maqbool
    Faisal, Shahla
    Heumann, Christian
    PLOS ONE, 2021, 16 (07):
  • [23] ORACLE INEQUALITIES AND SELECTION CONSISTENCY FOR WEIGHTED LASSO IN HIGH-DIMENSIONAL ADDITIVE HAZARDS MODEL
    Zhang, Haixiang
    Sun, Liuquan
    Zhou, Yong
    Huang, Jian
    STATISTICA SINICA, 2017, 27 (04) : 1903 - 1920
  • [24] High-dimensional missing data imputation via undirected graphical model
    Lee, Yoonah
    Park, Seongoh
    STATISTICS AND COMPUTING, 2024, 34 (05)
  • [25] High-Dimensional Sparse Additive Hazards Regression
    Lin, Wei
    Lv, Jinchi
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2013, 108 (501) : 247 - 264
  • [26] Bi-selection in the high-dimensional additive hazards regression model
    Liu, Li
    Su, Wen
    Zhao, Xingqiu
    ELECTRONIC JOURNAL OF STATISTICS, 2021, 15 (01): : 748 - 772
  • [27] Weighted multiple blockwise imputation method for high-dimensional regression with blockwise missing data
    Li, Jingmao
    Zhang, Qingzhao
    Chen, Song
    Fang, Kuangnan
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2023, 93 (03) : 459 - 474
  • [28] The sparsity and bias of the lasso selection in high-dimensional linear regression
    Zhang, Cun-Hui
    Huang, Jian
    ANNALS OF STATISTICS, 2008, 36 (04): : 1567 - 1594
  • [29] Lasso penalized model selection criteria for high-dimensional multivariate linear regression analysis
    Katayama, Shota
    Imori, Shinpei
    JOURNAL OF MULTIVARIATE ANALYSIS, 2014, 132 : 138 - 150
  • [30] Multiple imputation for longitudinal data using Bayesian lasso imputation model
    Yamaguchi, Yusuke
    Yoshida, Satoshi
    Misumi, Toshihiro
    Maruo, Kazushi
    STATISTICS IN MEDICINE, 2022, 41 (06) : 1042 - 1058