A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction

被引:260
|
作者
Pudjihartono, Nicholas [1 ]
Fadason, Tayaza [1 ,2 ]
Kempa-Liehr, Andreas W. [3 ]
O'Sullivan, Justin M. [1 ,2 ,4 ,5 ,6 ]
机构
[1] Univ Auckland, Liggins Inst, Auckland, New Zealand
[2] Maurice Wilkins Ctr Mol Biodiscovery, Auckland, New Zealand
[3] Univ Auckland, Dept Engn Sci, Auckland, New Zealand
[4] Univ Southampton, MRC Lifecourse Epidemiol Unit, Southampton, England
[5] ASTAR, Singapore Inst Clin Sci, Singapore, Singapore
[6] Garvan Inst Med Res, Australian Parkinsons Mission, Sydney, NSW, Australia
来源
关键词
machine learing; feature selection (FS); risk prediction; disease risk prediction; statistical approaches; GENOME-WIDE ASSOCIATION; ROBUST FEATURE-SELECTION; FALSE DISCOVERY RATE; MUTUAL INFORMATION; RANDOM FORESTS; GENE; RELEVANCE; LOCI; GWAS; DIMENSIONALITY;
D O I
10.3389/fbinf.2022.927312
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called "curse of dimensionality" (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most "informative" features and remove noisy "non-informative," irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Ensemble Learning-Based Feature Selection for Phage Protein Prediction
    Liu, Songbo
    Cui, Chengmin
    Chen, Huipeng
    Liu, Tong
    FRONTIERS IN MICROBIOLOGY, 2022, 13
  • [22] Feature selection and machine learning methods for optimal identification and prediction of subtypes in Parkinson's disease
    Salmanpour, R. Mohammad
    Shamsaei, Mojtaba
    Rahmim, Arman
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2021, 206
  • [23] A machine learning-based credit risk prediction engine system using a stacked classifier and a filter-based feature selection method
    Emmanuel, Ileberi
    Sun, Yanxia
    Wang, Zenghui
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [24] A machine learning-based credit risk prediction engine system using a stacked classifier and a filter-based feature selection method
    Ileberi Emmanuel
    Yanxia Sun
    Zenghui Wang
    Journal of Big Data, 11
  • [25] Three-stage feature selection approach for deep learning-based RUL prediction methods
    Wang, Youdao
    Zhao, Yifan
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2023, 39 (04) : 1223 - 1247
  • [26] Fracture risk prediction in diabetes patients based on Lasso feature selection and Machine Learning
    Shi, Yu
    Fang, Junhua
    Li, Jiayi
    Yu, Kaiwen
    Zhu, Jingbo
    Lu, Yan
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING, 2024,
  • [27] A machine learning-based approach for smart agriculture via stacking-based ensemble learning and feature selection methods
    Ben Abdallah, Emna
    Grati, Rima
    Boukadi, Khouloud
    2022 18TH INTERNATIONAL CONFERENCE ON INTELLIGENT ENVIRONMENTS (IE), 2022,
  • [28] Challenges and promises of machine learning-based risk prediction modelling in cardiovascular disease
    Gonzalez-Del-Hoyo, Maribel
    Rossello, Xavier
    EUROPEAN HEART JOURNAL-ACUTE CARDIOVASCULAR CARE, 2021, 10 (08) : 866 - 868
  • [29] A novel feature selection method based on global sensitivity analysis with application in machine learning-based prediction model
    Zhang, Pin
    APPLIED SOFT COMPUTING, 2019, 85
  • [30] Machine learning-based approaches for disease gene prediction
    Duc-Hau Le
    BRIEFINGS IN FUNCTIONAL GENOMICS, 2020, 19 (5-6) : 350 - 363