Performance and explainability of feature selection-boosted tree-based classifiers for COVID-19 detection

被引:2
|
作者
Rufino, Jesus [1 ]
Ramirez, Juan Marcos [1 ]
Aguilar, Jose [1 ,2 ,3 ]
Baquero, Carlos [4 ,5 ]
Champati, Jaya [1 ]
Frey, Davide [6 ]
Lillo, Rosa Elvira [7 ]
Fernandez-Anta, Antonio [1 ]
机构
[1] IMDEA Networks Inst, Madrid 28918, Spain
[2] Univ Los Andes, CEMISID, Merida 5101, Venezuela
[3] Univ EAFIT, CIDITIC, Medellin, Colombia
[4] Univ Minho, Braga, Portugal
[5] INESCTEC, Braga, Portugal
[6] INRIA, Rennes, France
[7] Univ Carlos III, Madrid, Spain
关键词
COVID-19; detection; Explainability analysis; Gradient boosting classifiers; Random forest; Recursive feature elimination; Shapley values;
D O I
10.1016/j.heliyon.2023.e23219
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In this paper, we evaluate the performance and analyze the explainability of machine learning models boosted by feature selection in predicting COVID-19-positive cases from self-reported information. In essence, this work describes a methodology to identify COVID-19 infections that considers the large amount of information collected by the University of Maryland Global COVID-19 Trends and Impact Survey (UMD-CTIS). More precisely, this methodology performs a feature selection stage based on the recursive feature elimination (RFE) method to reduce the number of input variables without compromising detection accuracy. A tree-based supervised machine learning model is then optimized with the selected features to detect COVID-19-active cases. In contrast to previous approaches that use a limited set of selected symptoms, the proposed approach builds the detection engine considering a broad range of features including self-reported symptoms, local community information, vaccination acceptance, and isolation measures, among others. To implement the methodology, three different supervised classifiers were used: random forests (RF), light gradient boosting (LGB), and extreme gradient boosting (XGB). Based on data collected from the UMD-CTIS, we evaluated the detection performance of the methodology for four countries (Brazil, Canada, Japan, and South Africa) and two periods (2020 and 2021). The proposed approach was assessed in terms of various quality metrics: F1-score, sensitivity, specificity, precision, receiver operating characteristic (ROC), and area under the ROC curve (AUC). This work also shows the normalized daily incidence curves obtained by the proposed approach for the four countries. Finally, we perform an explainability analysis using Shapley values and feature importance to determine the relevance of each feature and the corresponding contribution for each country and each country/year.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] A wrapper feature selection method for combined tree-based classifiers
    Gatnar, E
    FROM DATA AND INFORMATION ANALYSIS TO KNOWLEDGE ENGINEERING, 2006, : 119 - 125
  • [2] Investigating Feature Selection and Explainability for COVID-19 Diagnostics from Cough Sounds
    Avila, Flavio
    Poorjam, Amir H.
    Mittal, Deepak
    Dognin, Charles
    Muguli, Ananya
    Kumar, Rohit
    Chetupalli, Srikanth Raj
    Ganapathy, Sriram
    Singh, Maneesh
    INTERSPEECH 2021, 2021, : 951 - 955
  • [3] HFSTE: Hybrid Feature Selections and Tree-Based Classifiers Ensemble for Intrusion Detection System
    Tama, Bayu Adhi
    Rhee, Kyung-Hyune
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (08) : 1729 - 1737
  • [4] Performance evaluation of feature selection and tree-based algorithms for traffic classification
    Aouedi, Ons
    Piamrat, Kandaraj
    Parrein, Benoit
    2021 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS), 2021,
  • [5] Feature Bundles and their Effect on the Performance of Tree-based Evolutionary Classification and Feature Selection Algorithms
    Neshatian, Kourosh
    Varn, Lucianne
    2019 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2019, : 1612 - 1619
  • [6] Selection of tree-based classifiers with the bootstrap 632+ rule
    Merler, S
    Furlanello, C
    BIOMETRICAL JOURNAL, 1997, 39 (03) : 369 - 382
  • [7] Selection of tree-based classifiers with the bootstrap 632+ rule
    Centro di Ecologia Alpina, Trento, Italy
    不详
    不详
    不详
    Biom. J., 3 (369-382):
  • [8] A tree-based stacking ensemble technique with feature selection for network intrusion detection
    Mamunur Rashid
    Joarder Kamruzzaman
    Tasadduq Imam
    Santoso Wibowo
    Steven Gordon
    Applied Intelligence, 2022, 52 : 9768 - 9781
  • [9] A tree-based stacking ensemble technique with feature selection for network intrusion detection
    Rashid, Mamunur
    Kamruzzaman, Joarder
    Imam, Tasadduq
    Wibowo, Santoso
    Gordon, Steven
    APPLIED INTELLIGENCE, 2022, 52 (09) : 9768 - 9781
  • [10] A comparative study of combining tree-based feature selection methods and classifiers in personal loan default prediction
    Guo, Weidong
    Zhou, Zach Zhizhong
    JOURNAL OF FORECASTING, 2022, 41 (06) : 1248 - 1313