A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction

被引:260
|
作者
Pudjihartono, Nicholas [1 ]
Fadason, Tayaza [1 ,2 ]
Kempa-Liehr, Andreas W. [3 ]
O'Sullivan, Justin M. [1 ,2 ,4 ,5 ,6 ]
机构
[1] Univ Auckland, Liggins Inst, Auckland, New Zealand
[2] Maurice Wilkins Ctr Mol Biodiscovery, Auckland, New Zealand
[3] Univ Auckland, Dept Engn Sci, Auckland, New Zealand
[4] Univ Southampton, MRC Lifecourse Epidemiol Unit, Southampton, England
[5] ASTAR, Singapore Inst Clin Sci, Singapore, Singapore
[6] Garvan Inst Med Res, Australian Parkinsons Mission, Sydney, NSW, Australia
来源
关键词
machine learing; feature selection (FS); risk prediction; disease risk prediction; statistical approaches; GENOME-WIDE ASSOCIATION; ROBUST FEATURE-SELECTION; FALSE DISCOVERY RATE; MUTUAL INFORMATION; RANDOM FORESTS; GENE; RELEVANCE; LOCI; GWAS; DIMENSIONALITY;
D O I
10.3389/fbinf.2022.927312
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called "curse of dimensionality" (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most "informative" features and remove noisy "non-informative," irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Heart Diseases Prediction for Optimization based Feature Selection and Classification using Machine Learning Methods
    Rajinikanth, N.
    Pavithra, L.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (02) : 636 - 643
  • [32] The impact of feature selection methods on machine learning-based docking prediction of Indonesian medicinal plant compounds and HIV-1 protease
    Pujianto, Rahman
    Gultom, Yohanes
    Wibisono, Ari
    Yanuar, Arry
    Suhartanto, Heru
    2019 11TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS 2019), 2019, : 181 - 186
  • [33] A Feature Ranking and Selection Algorithm for Machine Learning-Based Step Counters
    Vandermeeren, Stef
    Van de Velde, Samuel
    Bruneel, Herwig
    Steendam, Heidi
    IEEE SENSORS JOURNAL, 2018, 18 (08) : 3255 - 3265
  • [34] Application of information theoretic feature selection and machine learning methods for the development of genetic risk prediction models
    Jalali-najafabadi, Farideh
    Stadler, Michael
    Dand, Nick
    Jadon, Deepak
    Soomro, Mehreen
    Ho, Pauline
    Marzo-Ortega, Helen
    Helliwell, Philip
    Korendowych, Eleanor
    Simpson, Michael A.
    Packham, Jonathan
    Smith, Catherine H.
    Barker, Jonathan N.
    McHugh, Neil
    Warren, Richard B.
    Barton, Anne
    Bowes, John
    Smith, Catherine H.
    Smith, Catherine H.
    Barker, Jonathan N.
    Warren, Richard B.
    Dand, Nick
    Dand, Nick
    Smith, Catherine H.
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [35] Application of information theoretic feature selection and machine learning methods for the development of genetic risk prediction models
    Farideh Jalali-najafabadi
    Michael Stadler
    Nick Dand
    Deepak Jadon
    Mehreen Soomro
    Pauline Ho
    Helen Marzo-Ortega
    Philip Helliwell
    Eleanor Korendowych
    Michael A. Simpson
    Jonathan Packham
    Catherine H. Smith
    Jonathan N. Barker
    Neil McHugh
    Richard B. Warren
    Anne Barton
    John Bowes
    Scientific Reports, 11
  • [36] Phishing detection based on machine learning and feature selection methods
    Almseidin M.
    Abu Zuraiq A.M.
    Al-kasassbeh M.
    Alnidami N.
    International Journal of Interactive Mobile Technologies, 2019, 13 (12) : 71 - 183
  • [37] Feature selection in single and ensemble learning-based bankruptcy prediction models
    Lin, Wei-Chao
    Lu, Yu-Hsin
    Tsai, Chih-Fong
    EXPERT SYSTEMS, 2019, 36 (01)
  • [38] Machine learning-based genetic feature identification and fatigue life prediction
    Zhou, Kun
    Sun, Xingyue
    Shi, Shouwen
    Song, Kai
    Chen, Xu
    FATIGUE & FRACTURE OF ENGINEERING MATERIALS & STRUCTURES, 2021, 44 (09) : 2524 - 2537
  • [39] A critical review of feature selection methods for machine learning in IoT security
    Li, Jing
    Othman, Mohd Shahizan
    Chen, Hewan
    Yusuf, Lizawati Mi
    INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2024, 30 (03) : 264 - 312
  • [40] Recursive Feature Elimination for Machine Learning-based Landslide Prediction Models
    Munasinghe, Kusala
    Karunanayake, Piyumika
    3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021), 2021, : 126 - 129