A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction

被引:260
|
作者
Pudjihartono, Nicholas [1 ]
Fadason, Tayaza [1 ,2 ]
Kempa-Liehr, Andreas W. [3 ]
O'Sullivan, Justin M. [1 ,2 ,4 ,5 ,6 ]
机构
[1] Univ Auckland, Liggins Inst, Auckland, New Zealand
[2] Maurice Wilkins Ctr Mol Biodiscovery, Auckland, New Zealand
[3] Univ Auckland, Dept Engn Sci, Auckland, New Zealand
[4] Univ Southampton, MRC Lifecourse Epidemiol Unit, Southampton, England
[5] ASTAR, Singapore Inst Clin Sci, Singapore, Singapore
[6] Garvan Inst Med Res, Australian Parkinsons Mission, Sydney, NSW, Australia
来源
关键词
machine learing; feature selection (FS); risk prediction; disease risk prediction; statistical approaches; GENOME-WIDE ASSOCIATION; ROBUST FEATURE-SELECTION; FALSE DISCOVERY RATE; MUTUAL INFORMATION; RANDOM FORESTS; GENE; RELEVANCE; LOCI; GWAS; DIMENSIONALITY;
D O I
10.3389/fbinf.2022.927312
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called "curse of dimensionality" (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most "informative" features and remove noisy "non-informative," irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Machine Learning-Based Prediction Methods for Home Burglary Crimes
    Wen, Shuo
    Li, Xiaomin
    Zhao, Lixuan
    Wu, Qi
    Du, Wei
    Jiang, Shangxuan
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (02) : 123 - 130
  • [42] Prediction of software quality with Machine Learning-Based ensemble methods
    Ceran A.A.
    Ar Y.
    Tanrıöver Ö.Ö.
    Seyrek Ceran S.
    Materials Today: Proceedings, 2023, 81 : 18 - 25
  • [43] Risk prediction of coal mine rock burst based on machine learning and feature selection algorithm
    Miao, Dejun
    Yao, Kaixin
    Wang, Wenhao
    Liu, Lu
    Sui, Xiuhua
    GEORISK-ASSESSMENT AND MANAGEMENT OF RISK FOR ENGINEERED SYSTEMS AND GEOHAZARDS, 2024, 18 (04) : 868 - 881
  • [44] Feature Selection and Validation of a Machine Learning-Based Lower Limb Risk Assessment Tool: A Feasibility Study
    Das, Swagata
    Sakoda, Wataru
    Ramasamy, Priyanka
    Tadayon, Ramin
    Ramirez, Antonio Vega
    Kurita, Yuichi
    SENSORS, 2021, 21 (19)
  • [45] Machine learning-based risk prediction model for cardiovascular disease using a hybrid dataset
    Kanagarathinam, Karthick
    Sankaran, Durairaj
    Manikandan, R.
    DATA & KNOWLEDGE ENGINEERING, 2022, 140
  • [46] Feature selection strategy for machine learning methods in building energy consumption prediction
    Qiao, Qingyao
    Yunusa-Kaltungo, Akilu
    Edwards, Rodger E.
    ENERGY REPORTS, 2022, 8 : 13621 - 13654
  • [47] Machine learning-based feature selection to search stable microbial biomarkers: application to inflammatory bowel disease
    Lee, Youngro
    Cappellato, Marco
    Di Camillo, Barbara
    GIGASCIENCE, 2023, 12
  • [48] Machine learning-based feature selection to search stable microbial biomarkers: application to inflammatory bowel disease
    Lee, Youngro
    Cappellato, Marco
    Di Camillo, Barbara
    GIGASCIENCE, 2023, 12
  • [49] A machine learning-based diabetes risk prediction modeling study
    Ming, Jiexiu
    Xu, Junyi
    Zhang, Miaomiao
    Li, Ningyu
    Yan, Xu
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON COMPUTER AND MULTIMEDIA TECHNOLOGY, ICCMT 2024, 2024, : 363 - 369
  • [50] A machine learning-based universal outbreak risk prediction tool
    Zhang, Tianyu
    Rabhi, Fethi
    Chen, Xin
    Paik, Hye-young
    Macintyre, Chandini Raina
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 169