Elastic net-based high dimensional data selection for regression

被引:10
|
作者
Chamlal, Hasna [1 ]
Benzmane, Asmaa [1 ]
Ouaderhman, Tayeb [1 ]
机构
[1] Hassan II Univ, Fac Sci Ain Chock, Dept Math & Informat, Fundamental & Appl Math Lab, Casablanca, Morocco
关键词
Feature screening; Regression; Rank correlation; High-dimensional data; Elastic net; VIEW; REGULARIZATION; ALGORITHM;
D O I
10.1016/j.eswa.2023.122958
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High -dimensional feature selection is of particular interest to researchers. In some domains, such as microarray data, it is quite common for a group of highly correlated explanatory variables to be of equal importance for inclusion in the predictive model. This paper proposes a new hybrid feature selection approach that integrates feature screening based on Kendall's tau and Elastic Net regularized regression (K -EN). K -EN as an approach that embeds the Elastic Net, has the advantage of the grouping effect, which automatically includes all the highly correlated variables in the group. The K -EN approach offers insightful solutions to high -dimensional regression problems and improves Elastic Net performance since the screening phase is preceded by a step that further reduces the number of explanatory variables by removing those that disagree with the target based on Kendall's tau. The use of Kendall's tau further enhances Elastic Net performance, as it is robust enough to handle heavy-tailed distributions, non-parametric models, outliers, and non-normal data with greater ease. K -EN is therefore a time-saving approach. The proposed algorithm is evaluated on four simulation scenarios and four publicly available datasets, including riboflavin, eyedata, Longley, and Boston Housing, and achieves 0.2528, 0.0098, 0.1007, and 0.4121 respectively as the Mean Squared Error (MSE). K-EN's MSEs are the best compared to those achieved by the state-of-the-art approaches reviewed in this paper. In addition, K -EN selects up to 100% of relevant features when run on simulated data.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Elastic net-based framework for imaging mass spectrometry data biomarker selection and classification
    Zhang, Fengqing
    Hong, Don
    STATISTICS IN MEDICINE, 2011, 30 (07) : 753 - 768
  • [2] A group adaptive elastic-net approach for variable selection in high-dimensional linear regression
    Hu, Jianhua
    Huang, Jian
    Qiu, Feng
    SCIENCE CHINA-MATHEMATICS, 2018, 61 (01) : 173 - 188
  • [3] A group adaptive elastic-net approach for variable selection in high-dimensional linear regression
    Jianhua Hu
    Jian Huang
    Feng Qiu
    Science China Mathematics, 2018, 61 : 173 - 188
  • [4] A group adaptive elastic-net approach for variable selection in high-dimensional linear regression
    Jianhua Hu
    Jian Huang
    Feng Qiu
    Science China(Mathematics), 2018, 61 (01) : 173 - 188
  • [5] Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification
    Algamal, Zakariya Yahya
    Lee, Muhammad Hisyam
    COMPUTERS IN BIOLOGY AND MEDICINE, 2015, 67 : 136 - 145
  • [6] Variable selection for uncertain regression models based on elastic net method
    Zhang, Guidong
    Zhao, Wenzhi
    Sheng, Yuhong
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,
  • [7] An Efficient Elastic Net with Regression Coefficients Method for Variable Selection of Spectrum Data
    Liu, Wenya
    Li, Qi
    PLOS ONE, 2017, 12 (02):
  • [8] Nonnegative estimation and variable selection via adaptive elastic-net for high-dimensional data
    Li, Ning
    Yang, Hu
    Yang, Jing
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2021, 50 (12) : 4263 - 4279
  • [9] The Cluster Elastic Net for High-Dimensional Regression With Unknown Variable Grouping
    Witten, Daniela M.
    Shojaie, Ali
    Zhang, Fan
    TECHNOMETRICS, 2014, 56 (01) : 112 - 122
  • [10] High Dimensional Logistic Regression Model using Adjusted Elastic Net Penalty
    Algamal, Zakariya Yahya
    Lee, Muhammad Hisyam
    PAKISTAN JOURNAL OF STATISTICS AND OPERATION RESEARCH, 2015, 11 (04) : 667 - 676