Performance and explainability of feature selection-boosted tree-based classifiers for COVID-19 detection

被引:2
|
作者
Rufino, Jesus [1 ]
Ramirez, Juan Marcos [1 ]
Aguilar, Jose [1 ,2 ,3 ]
Baquero, Carlos [4 ,5 ]
Champati, Jaya [1 ]
Frey, Davide [6 ]
Lillo, Rosa Elvira [7 ]
Fernandez-Anta, Antonio [1 ]
机构
[1] IMDEA Networks Inst, Madrid 28918, Spain
[2] Univ Los Andes, CEMISID, Merida 5101, Venezuela
[3] Univ EAFIT, CIDITIC, Medellin, Colombia
[4] Univ Minho, Braga, Portugal
[5] INESCTEC, Braga, Portugal
[6] INRIA, Rennes, France
[7] Univ Carlos III, Madrid, Spain
关键词
COVID-19; detection; Explainability analysis; Gradient boosting classifiers; Random forest; Recursive feature elimination; Shapley values;
D O I
10.1016/j.heliyon.2023.e23219
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In this paper, we evaluate the performance and analyze the explainability of machine learning models boosted by feature selection in predicting COVID-19-positive cases from self-reported information. In essence, this work describes a methodology to identify COVID-19 infections that considers the large amount of information collected by the University of Maryland Global COVID-19 Trends and Impact Survey (UMD-CTIS). More precisely, this methodology performs a feature selection stage based on the recursive feature elimination (RFE) method to reduce the number of input variables without compromising detection accuracy. A tree-based supervised machine learning model is then optimized with the selected features to detect COVID-19-active cases. In contrast to previous approaches that use a limited set of selected symptoms, the proposed approach builds the detection engine considering a broad range of features including self-reported symptoms, local community information, vaccination acceptance, and isolation measures, among others. To implement the methodology, three different supervised classifiers were used: random forests (RF), light gradient boosting (LGB), and extreme gradient boosting (XGB). Based on data collected from the UMD-CTIS, we evaluated the detection performance of the methodology for four countries (Brazil, Canada, Japan, and South Africa) and two periods (2020 and 2021). The proposed approach was assessed in terms of various quality metrics: F1-score, sensitivity, specificity, precision, receiver operating characteristic (ROC), and area under the ROC curve (AUC). This work also shows the normalized daily incidence curves obtained by the proposed approach for the four countries. Finally, we perform an explainability analysis using Shapley values and feature importance to determine the relevance of each feature and the corresponding contribution for each country and each country/year.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] COVID-19 Cases Prediction in Saudi Arabia Using Tree-based Ensemble Models
    Almazroi, Abdulwahab Ali
    Usmani, Raja Sher Afgun
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (01): : 389 - 400
  • [22] A Tree-based Mortality Prediction Model of COVID-19 from Routine Blood Samples
    Qomariyah, Nunung Nurul
    Andi Purwita, Ardimas
    Atas Asri, Sri Dhuny
    Kazakov, Dimitar
    8th International Conference on ICT for Smart Society: Digital Twin for Smart Society, ICISS 2021 - Proceeding, 2021,
  • [23] Optimized Intrusion Detection for IoMT Networks with Tree-Based Machine Learning and Filter-Based Feature Selection
    Balhareth, Ghaida
    Ilyas, Mohammad
    SENSORS, 2024, 24 (17)
  • [24] An intelligent DDoS attack detection tree-based model using Gini index feature selection method
    Bouke, Mohamed Aly
    Abdullah, Azizol
    ALshatebi, Sameer Hamoud
    Abdullah, Mohd Taufik
    El Atigh, Hayate
    MICROPROCESSORS AND MICROSYSTEMS, 2023, 98
  • [25] Performance Assessment of Decision Tree-based Predictive Classifiers for Risk Pregnancy Care
    Moreira, Mario W. L.
    Rodrigues, Joel J. P. C.
    Kumar, Neeraj
    Niu, Jianwei
    Woungang, Isaac
    GLOBECOM 2017 - 2017 IEEE GLOBAL COMMUNICATIONS CONFERENCE, 2017,
  • [26] Health monitoring of automotive clutch system through feature fusion and application of tree-based classifiers
    Kurian, Jonathan
    Sridharan, Naveen Venkatesh
    Chakrapani, Ganjikunta
    Vaithiyanathan, Sugumaran
    STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2024,
  • [27] Tree-Based Morse Regions: A Topological Approach to Local Feature Detection
    Xu, Yongchao
    Monasse, Pascal
    Geraud, Thierry
    Najman, Laurent
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (12) : 5612 - 5625
  • [28] Cardiac anomaly detection based on time and frequency domain features using tree-based classifiers
    Kropf, M.
    Hayn, D.
    Morris, D.
    Radhakrishnan, Aravind-Kumar
    Belyayskiy, E.
    Frydas, A.
    Pieske-Kraigher, E.
    Pieske, B.
    Schreier, G.
    PHYSIOLOGICAL MEASUREMENT, 2018, 39 (11)
  • [29] Occlusion Detection Based on Fractal Texture Analysis in Surveillance Videos Using Tree-Based Classifiers
    Arunnehru, J.
    Geetha, M. Kalaiselvi
    Nanthini, T.
    SECURITY IN COMPUTING AND COMMUNICATIONS (SSCC 2015), 2015, 536 : 307 - 316
  • [30] COVID-19 disease identification network based on weakly supervised feature selection
    Liu, Jingyao
    Feng, Qinghe
    Miao, Yu
    He, Wei
    Shi, Weili
    Jiang, Zhengang
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (05) : 9327 - 9348