Elastic net-based high dimensional data selection for regression

被引:10
|
作者
Chamlal, Hasna [1 ]
Benzmane, Asmaa [1 ]
Ouaderhman, Tayeb [1 ]
机构
[1] Hassan II Univ, Fac Sci Ain Chock, Dept Math & Informat, Fundamental & Appl Math Lab, Casablanca, Morocco
关键词
Feature screening; Regression; Rank correlation; High-dimensional data; Elastic net; VIEW; REGULARIZATION; ALGORITHM;
D O I
10.1016/j.eswa.2023.122958
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High -dimensional feature selection is of particular interest to researchers. In some domains, such as microarray data, it is quite common for a group of highly correlated explanatory variables to be of equal importance for inclusion in the predictive model. This paper proposes a new hybrid feature selection approach that integrates feature screening based on Kendall's tau and Elastic Net regularized regression (K -EN). K -EN as an approach that embeds the Elastic Net, has the advantage of the grouping effect, which automatically includes all the highly correlated variables in the group. The K -EN approach offers insightful solutions to high -dimensional regression problems and improves Elastic Net performance since the screening phase is preceded by a step that further reduces the number of explanatory variables by removing those that disagree with the target based on Kendall's tau. The use of Kendall's tau further enhances Elastic Net performance, as it is robust enough to handle heavy-tailed distributions, non-parametric models, outliers, and non-normal data with greater ease. K -EN is therefore a time-saving approach. The proposed algorithm is evaluated on four simulation scenarios and four publicly available datasets, including riboflavin, eyedata, Longley, and Boston Housing, and achieves 0.2528, 0.0098, 0.1007, and 0.4121 respectively as the Mean Squared Error (MSE). K-EN's MSEs are the best compared to those achieved by the state-of-the-art approaches reviewed in this paper. In addition, K -EN selects up to 100% of relevant features when run on simulated data.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Variable selection for partially varying coefficient model based on modal regression under high dimensional data
    Xia, Yafeng
    Zhang, Lirong
    Zhang, Aiping
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2022, 51 (01) : 232 - 248
  • [32] Elastic Net-Based Identification of a Multigene Combination Predicting the Survival of Patients with Cervical Cancer
    Wang, Hua
    Li, Shu-Wei
    Li, Wei
    Cai, Hong-Bing
    MEDICAL SCIENCE MONITOR, 2019, 25 : 10105 - 10113
  • [33] Net-based learning for the next millenium
    Synnes, K
    Parnes, P
    Widén, J
    Schefström, D
    WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL 1, PROCEEDINGS: INFORMATION SYSTEMS, 1999, : 363 - 368
  • [34] Applying Data Conversion between Excel and Database in .NET-based Applications
    Shi Xingjun
    COMPUTATIONAL MATERIALS SCIENCE, PTS 1-3, 2011, 268-270 : 1289 - 1294
  • [35] Enmsp: an elastic-net multi-step screening procedure for high-dimensional regression
    Xue, Yushan
    Ren, Jie
    Yang, Bin
    STATISTICS AND COMPUTING, 2024, 34 (02)
  • [36] Enmsp: an elastic-net multi-step screening procedure for high-dimensional regression
    Yushan Xue
    Jie Ren
    Bin Yang
    Statistics and Computing, 2024, 34
  • [37] Net-based professional counselling in group
    Ronnild, A
    INTERNATIONAL JOURNAL OF CANCER, 2002, : 465 - 465
  • [38] The Bayesian elastic net regression
    Alhamzawi, Rahim
    Ali, Haithem Taha Mohammad
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2018, 47 (04) : 1168 - 1178
  • [39] A net-based approach to coordination polymers
    Robson, R
    JOURNAL OF THE CHEMICAL SOCIETY-DALTON TRANSACTIONS, 2000, (21): : 3735 - 3744
  • [40] Petri Net-Based Problem Solving
    Capkovic, Frantisek
    2014 IEEE 12TH INTERNATIONAL SYMPOSIUM ON INTELLIGENT SYSTEMS AND INFORMATICS (SISY), 2014, : 203 - 208