A Study of High-Dimensional Data Imputation Using Additive LASSO Regression Model

被引:4
|
作者
Lavanya, K. [1 ]
Reddy, L. S. S. [2 ]
Reddy, B. Eswara [3 ]
机构
[1] JNTUA, Dept Comp Sci & Engn, Anantapur 515822, Andhra Pradesh, India
[2] KLU, Dept Comp Sci & Engn, Guntur 522502, Andhra Pradesh, India
[3] JNTUA, Dept Comp Sci, Anantapur 517234, Andhra Pradesh, India
关键词
High-dimensional data; Multiple imputations; Regression; Missing data; MULTIPLE IMPUTATION; MISSING-DATA; METAANALYSIS; HETEROGENEITY;
D O I
10.1007/978-981-10-8055-5_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid growth of computational domains, bioinformatics finance, engineering, biometrics, and neuroimaging emphasize the necessity for analyzing high-dimensional data. Many real-world datasets may contain hundreds or thousands of features. The common problem in most of the knowledge-based classification problems is quality and quantity of data. In general, the common problem with many high-dimensional data samples is that it contains missing or unknown attribute values, incomplete feature vectors, and uncertain or vague data which have to be handled carefully. Due to the presence of a large segment of missing values in the datasets, refined multiple imputation methods are required to estimate the missing values so that a fair and more consistent analysis can be achieved. In this paper, three imputation (MI) methods, mean, imputations predictive mean, and imputations by additive LASSO, are employed in cloud. Results show that imputations by additive LASSO are the preferred multiple imputation (MI) method.
引用
收藏
页码:19 / 30
页数:12
相关论文
共 50 条
  • [41] An Additive Sparse Penalty for Variable Selection in High-Dimensional Linear Regression Model
    Lee, Sangin
    COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2015, 22 (02) : 147 - 157
  • [42] On the sign consistency of the Lasso for the high-dimensional Cox model
    Lv, Shaogao
    You, Mengying
    Lin, Huazhen
    Lian, Heng
    Huang, Jian
    JOURNAL OF MULTIVARIATE ANALYSIS, 2018, 167 : 79 - 96
  • [43] Comparative study of computational algorithms for the Lasso with high-dimensional, highly correlated data
    Kim, Baekjin
    Yu, Donghyeon
    Won, Joong-Ho
    APPLIED INTELLIGENCE, 2018, 48 (08) : 1933 - 1952
  • [44] Comparative study of computational algorithms for the Lasso with high-dimensional, highly correlated data
    Baekjin Kim
    Donghyeon Yu
    Joong-Ho Won
    Applied Intelligence, 2018, 48 : 1933 - 1952
  • [45] A study on tuning parameter selection for the high-dimensional lasso
    Homrighausen, Darren
    McDonald, Daniel J.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (15) : 2865 - 2892
  • [46] Penalised empirical likelihood for the additive hazards model with high-dimensional data
    Fang, Jianglin
    Liu, Wanrong
    Lu, Xuewen
    JOURNAL OF NONPARAMETRIC STATISTICS, 2017, 29 (02) : 326 - 345
  • [47] Regression imputation with Q-mode clustering for rounded zero replacement in high-dimensional compositional data
    Chen, Jiajia
    Zhang, Xiaoqin
    Hron, Karel
    Templ, Matthias
    Li, Shengjia
    JOURNAL OF APPLIED STATISTICS, 2018, 45 (11) : 2067 - 2080
  • [48] Asymptotic properties of Lasso plus mLS and Lasso plus Ridge in sparse high-dimensional linear regression
    Liu, Hanzhong
    ELECTRONIC JOURNAL OF STATISTICS, 2013, 7 : 3124 - 3169
  • [49] A New Multiple Imputation Method for High-Dimensional Neuroimaging Data
    Lu, Tong
    Kochunov, Peter
    Chen, Chixiang
    Huang, Hsin-Hsiung
    Hong, L. Elliot
    Chen, Shuo
    HUMAN BRAIN MAPPING, 2025, 46 (05)
  • [50] The Pairwise Gaussian Random Field for High-Dimensional Data Imputation
    Cai, Zhuhua
    Jermaine, Christopher
    Vagena, Zografoula
    Logothetis, Dionysios
    Perez, Luis L.
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 61 - 70