Comparative analysis of feature selection techniques for COVID-19 dataset

被引:2
|
作者
Mohtasham, Farideh [1 ]
Pourhoseingholi, MohamadAmin [2 ]
Nazari, Seyed Saeed Hashemi [3 ]
Kavousi, Kaveh [4 ]
Zali, Mohammad Reza [1 ]
机构
[1] Shahid Beheshti Univ Med Sci, Res Inst Gastroenterol & Liver Dis, Gastroenterol & Liver Dis Res Ctr, Tehran, Iran
[2] Univ Nottingham, Natl Inst Hlth & Care Res NIHR Nottingham Biomed R, Hearing Sci Mental Hlth & Clin Neurosci, Sch Med, Nottingham, England
[3] Shahid Beheshti Univ Med Sci SBMU, Dept Epidemiol, Sch Publ Hlth & Safety, Tehran, Iran
[4] Univ Tehran, Inst Biochem & Biophys IBB, Dept Bioinformat, Lab Complex Biol Syst & Bioinformat CBB, Tehran, Iran
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
MODELS;
D O I
10.1038/s41598-024-69209-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In the context of early disease detection, machine learning (ML) has emerged as a vital tool. Feature selection (FS) algorithms play a crucial role in ensuring the accuracy of predictive models by identifying the most influential variables. This study, focusing on a retrospective cohort of 4778 COVID-19 patients from Iran, explores the performance of various FS methods, including filter, embedded, and hybrid approaches, in predicting mortality outcomes. The researchers leveraged 115 routine clinical, laboratory, and demographic features and employed 13 ML models to assess the effectiveness of these FS methods based on classification accuracy, predictive accuracy, and statistical tests. The results indicate that a Hybrid Boruta-VI model combined with the Random Forest algorithm demonstrated superior performance, achieving an accuracy of 0.89, an F1 score of 0.76, and an AUC value of 0.95 on test data. Key variables identified as important predictors of adverse outcomes include age, oxygen saturation levels, albumin levels, neutrophil counts, platelet levels, and markers of kidney function. These findings highlight the potential of advanced FS techniques and ML models in enhancing early disease detection and informing clinical decision-making.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Comparative Analysis of Feature Selection Methods to Identify Biomarkers in a Stroke-Related Dataset
    Clifford, Thomas
    Bruce, Justin
    Obafemi-Ajayi, Tayo
    Matta, John
    2019 16TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY - CIBCB 2019, 2019, : 51 - 58
  • [32] Temporal analysis and opinion dynamics of COVID-19 vaccination tweets using diverse feature engineering techniques
    Ahmed, Shoaib
    Khan, Dost Muhammad
    Sadiq, Saima
    Umer, Muhammad
    Shahzad, Faisal
    Mahmood, Khalid
    Mohsen, Heba
    Ashraf, Imran
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [33] Integrating Information Gain methods for Feature Selection in Distance Education Sentiment Analysis during Covid-19
    Rijal, Syamsu
    Cakranegara, Pandu Adi
    Ciptaningsih, Eka Maya S. S.
    Pebriana, Putri Hana
    Andiyan, A. .
    Rahim, Robbi
    TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2023, 12 (01): : 285 - 290
  • [34] Temporal analysis and opinion dynamics of COVID-19 vaccination tweets using diverse feature engineering techniques
    Ahmed S.
    Khan D.M.
    Sadiq S.
    Umer M.
    Shahzad F.
    Mahmood K.
    Mohsen H.
    Ashraf I.
    PeerJ Computer Science, 2023, 9 : 1 - 29
  • [35] Multi-Model Selection and Analysis for COVID-19
    Ma, Nuri
    Ma, Weiyuan
    Li, Zhiming
    FRACTAL AND FRACTIONAL, 2021, 5 (03)
  • [36] ISARIC-COVID-19 dataset: A Prospective, Standardized, Global Dataset of Patients Hospitalized with COVID-19
    Garcia-Gallo, Esteban
    Merson, Laura
    Kennon, Kalynn
    Kelly, Sadie
    Citarella, Barbara Wanjiru
    Fryer, Daniel Vidali
    Shrapnel, Sally
    Lee, James
    Duque, Sara
    Fuentes, Yuli V.
    Balan, Valeria
    Smith, Sue
    Wei, Jia
    Goncalves, Bronner P.
    Russell, Clark D.
    Sigfrid, Louise
    Dagens, Andrew
    Olliaro, Piero L.
    Baruch, Joaquin
    Kartsonaki, Christiana
    Dunning, Jake
    Rojek, Amanda
    Rashan, Aasiyah
    Beane, Abi
    Murthy, Srinivas
    Reyes, Luis Felipe
    SCIENTIFIC DATA, 2022, 9 (01)
  • [37] Design and analysis of a large-scale COVID-19 tweets dataset
    Rabindra Lamsal
    Applied Intelligence, 2021, 51 : 2790 - 2804
  • [38] Dataset for country profile and mobility analysis in the assessment of COVID-19 pandemic
    Ribeiro-Dantas, Marcel da Camara
    Alves, Gisliany
    Gomes, Rafael B.
    Bezerra, Leonardo C. T.
    Lima, Luciana
    Silva, Ivanovitch
    DATA IN BRIEF, 2020, 31
  • [39] Design and analysis of a large-scale COVID-19 tweets dataset
    Lamsal, Rabindra
    APPLIED INTELLIGENCE, 2021, 51 (05) : 2790 - 2804
  • [40] Dataset Analysis of the Risks for Russian IT Companies Amid the COVID-19 Crisis
    Vorozheykina, Tatiana M.
    Shchetinin, Aleksei Yu.
    Semenova, Galina N.
    Vakhrushina, Maria A.
    RISKS, 2023, 11 (07)