Comparative analysis of feature selection techniques for COVID-19 dataset

被引:2
|
作者
Mohtasham, Farideh [1 ]
Pourhoseingholi, MohamadAmin [2 ]
Nazari, Seyed Saeed Hashemi [3 ]
Kavousi, Kaveh [4 ]
Zali, Mohammad Reza [1 ]
机构
[1] Shahid Beheshti Univ Med Sci, Res Inst Gastroenterol & Liver Dis, Gastroenterol & Liver Dis Res Ctr, Tehran, Iran
[2] Univ Nottingham, Natl Inst Hlth & Care Res NIHR Nottingham Biomed R, Hearing Sci Mental Hlth & Clin Neurosci, Sch Med, Nottingham, England
[3] Shahid Beheshti Univ Med Sci SBMU, Dept Epidemiol, Sch Publ Hlth & Safety, Tehran, Iran
[4] Univ Tehran, Inst Biochem & Biophys IBB, Dept Bioinformat, Lab Complex Biol Syst & Bioinformat CBB, Tehran, Iran
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
MODELS;
D O I
10.1038/s41598-024-69209-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In the context of early disease detection, machine learning (ML) has emerged as a vital tool. Feature selection (FS) algorithms play a crucial role in ensuring the accuracy of predictive models by identifying the most influential variables. This study, focusing on a retrospective cohort of 4778 COVID-19 patients from Iran, explores the performance of various FS methods, including filter, embedded, and hybrid approaches, in predicting mortality outcomes. The researchers leveraged 115 routine clinical, laboratory, and demographic features and employed 13 ML models to assess the effectiveness of these FS methods based on classification accuracy, predictive accuracy, and statistical tests. The results indicate that a Hybrid Boruta-VI model combined with the Random Forest algorithm demonstrated superior performance, achieving an accuracy of 0.89, an F1 score of 0.76, and an AUC value of 0.95 on test data. Key variables identified as important predictors of adverse outcomes include age, oxygen saturation levels, albumin levels, neutrophil counts, platelet levels, and markers of kidney function. These findings highlight the potential of advanced FS techniques and ML models in enhancing early disease detection and informing clinical decision-making.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] ISARIC-COVID-19 dataset: A Prospective, Standardized, Global Dataset of Patients Hospitalized with COVID-19
    Esteban Garcia-Gallo
    Laura Merson
    Kalynn Kennon
    Sadie Kelly
    Barbara Wanjiru Citarella
    Daniel Vidali Fryer
    Sally Shrapnel
    James Lee
    Sara Duque
    Yuli V. Fuentes
    Valeria Balan
    Sue Smith
    Jia Wei
    Bronner P. Gonçalves
    Clark D. Russell
    Louise Sigfrid
    Andrew Dagens
    Piero L. Olliaro
    Joaquin Baruch
    Christiana Kartsonaki
    Jake Dunning
    Amanda Rojek
    Aasiyah Rashan
    Abi Beane
    Srinivas Murthy
    Luis Felipe Reyes
    Scientific Data, 9
  • [42] Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset
    Qorib, Miftahul
    Oladunni, Timothy
    Denis, Max
    Ososanya, Esther
    Cotae, Paul
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212
  • [43] Investigating Feature Selection and Explainability for COVID-19 Diagnostics from Cough Sounds
    Avila, Flavio
    Poorjam, Amir H.
    Mittal, Deepak
    Dognin, Charles
    Muguli, Ananya
    Kumar, Rohit
    Chetupalli, Srikanth Raj
    Ganapathy, Sriram
    Singh, Maneesh
    INTERSPEECH 2021, 2021, : 951 - 955
  • [44] A comprehensive evaluation of Marine predator chaotic algorithm for feature selection of COVID-19
    Akash Saxena
    Siddharth Singh Chouhan
    Rabia Musheer Aziz
    Vani Agarwal
    Evolving Systems, 2024, 15 (4) : 1235 - 1248
  • [45] A novel hybrid approach for feature selection enhancement: COVID-19 case study
    Limam, Hela
    Hasni, Oumaima
    Ben Alaya, Ines
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING, 2023, 26 (10) : 1183 - 1197
  • [46] A comprehensive evaluation of Marine predator chaotic algorithm for feature selection of COVID-19
    Saxena, Akash
    Chouhan, Siddharth Singh
    Aziz, Rabia Musheer
    Agarwal, Vani
    EVOLVING SYSTEMS, 2024, 15 (04) : 1235 - 1248
  • [47] A comprehensive evaluation of Marine predator chaotic algorithm for feature selection of COVID-19
    Saxena, Akash
    Chouhan, Siddharth Singh
    Aziz, Rabia Musheer
    Agarwal, Vani
    EVOLVING SYSTEMS, 2024,
  • [48] Feature Selection by Hybrid Brain Storm Optimization Algorithm for COVID-19 Classification
    BEZDAN, T. I. M. E. A.
    ZIVKOVIC, M. I. O. D. R. A. G.
    BACANIN, N. E. B. O. J. S. A.
    CHHABRA, A. M. I. T.
    SURESH, M. U. T. H. U. S. A. M. Y.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2022, 29 (06) : 515 - 529
  • [49] COVID-19 disease identification network based on weakly supervised feature selection
    Liu, Jingyao
    Feng, Qinghe
    Miao, Yu
    He, Wei
    Shi, Weili
    Jiang, Zhengang
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (05) : 9327 - 9348
  • [50] A Comparative Study of Clustering Techniques Applied on Covid-19 Scientific Literature
    Bellandi, Valerio
    Ceravolo, Paolo
    Maghool, Samira
    Siccardi, Stefano
    2020 7TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS: SYSTEMS, MANAGEMENT AND SECURITY (IOTSMS), 2020,