A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction

被引:260
|
作者
Pudjihartono, Nicholas [1 ]
Fadason, Tayaza [1 ,2 ]
Kempa-Liehr, Andreas W. [3 ]
O'Sullivan, Justin M. [1 ,2 ,4 ,5 ,6 ]
机构
[1] Univ Auckland, Liggins Inst, Auckland, New Zealand
[2] Maurice Wilkins Ctr Mol Biodiscovery, Auckland, New Zealand
[3] Univ Auckland, Dept Engn Sci, Auckland, New Zealand
[4] Univ Southampton, MRC Lifecourse Epidemiol Unit, Southampton, England
[5] ASTAR, Singapore Inst Clin Sci, Singapore, Singapore
[6] Garvan Inst Med Res, Australian Parkinsons Mission, Sydney, NSW, Australia
来源
关键词
machine learing; feature selection (FS); risk prediction; disease risk prediction; statistical approaches; GENOME-WIDE ASSOCIATION; ROBUST FEATURE-SELECTION; FALSE DISCOVERY RATE; MUTUAL INFORMATION; RANDOM FORESTS; GENE; RELEVANCE; LOCI; GWAS; DIMENSIONALITY;
D O I
10.3389/fbinf.2022.927312
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called "curse of dimensionality" (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most "informative" features and remove noisy "non-informative," irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Prediction of heart disease by classifying with feature selection and machine learning methods
    Gazeloglu, Cengiz
    PROGRESS IN NUTRITION, 2020, 22 (02): : 660 - 670
  • [2] Ensemble feature selection and classification methods for machine learning-based coronary artery disease diagnosis
    Kolukisa, Burak
    Bakir-Gungor, Burcu
    COMPUTER STANDARDS & INTERFACES, 2023, 84
  • [3] Machine Learning-Based Feature Extraction and Selection
    Ruano-Ordas, David
    APPLIED SCIENCES-BASEL, 2024, 14 (15):
  • [4] Review on machine learning-based traffic flow prediction methods
    Yao J.-F.
    He R.
    Shi T.-T.
    Wang P.
    Zhao X.-M.
    Jiaotong Yunshu Gongcheng Xuebao/Journal of Traffic and Transportation Engineering, 2023, 23 (03): : 44 - 67
  • [5] IoT security: a systematic literature review of feature selection methods for machine learning-based attack classification
    Li, Jing
    Othman, Mohd Shahizan
    Hewan, Chen
    Yusuf, Lizawati Mi
    INTERNATIONAL JOURNAL OF ELECTRONIC SECURITY AND DIGITAL FORENSICS, 2025, 17 (1-2) : 60 - 107
  • [6] Incorporating feature selection methods into a machine learning-based neonatal seizure diagnosis
    Acikoglu, Merve
    Tuncer, Seda Arslan
    MEDICAL HYPOTHESES, 2020, 135
  • [7] Feature Selection Based Machine Learning to Improve Prediction of Parkinson Disease
    Nahar, Nazmun
    Ara, Ferdous
    Neloy, Md Arif Istiek
    Biswas, Anik
    Hossain, Mohammad Shahadat
    Andersson, Karl
    BRAIN INFORMATICS, BI 2021, 2021, 12960 : 496 - 508
  • [8] A Machine Learning-Based Wrapper Method for Feature Selection
    Patel, Damodar
    Saxena, Amit
    Wang, John
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2024, 20 (01)
  • [9] Machine Learning-Based Cardiovascular Disease Detection Using Optimal Feature Selection
    Ullah, Tahseen
    Ullah, Syed Irfan
    Ullah, Khalil
    Ishaq, Muhammad
    Khan, Ahmad
    Ghadi, Yazeed Yasin
    Algarni, Abdulmohsen
    IEEE ACCESS, 2024, 12 : 16431 - 16446
  • [10] An Improved Machine Learning-Based Employees Attrition Prediction Framework with Emphasis on Feature Selection
    Najafi-Zangeneh, Saeed
    Shams-Gharneh, Naser
    Arjomandi-Nezhad, Ali
    Zolfani, Sarfaraz Hashemkhani
    MATHEMATICS, 2021, 9 (11)