Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods

被引:13
|
作者
Feng, Cindy [1 ]
Kephart, George [1 ]
Juarez-Colunga, Elizabeth [2 ]
机构
[1] Dalhousie Univ, Dept Community Hlth & Epidemiol, Fac Med, 5790 Univ Ave, Halifax, NS B3H 1V7, Canada
[2] Univ Colorado, Dept Biostat & Informat, Anschutz Med Campus, Aurora, CO 80045 USA
基金
加拿大自然科学与工程研究理事会;
关键词
COVID-19; mortality; Predictive model; Generalized additive model; Classification trees; Extreme gradient boosting; LOGISTIC-REGRESSION; MODELS;
D O I
10.1186/s12874-021-01441-4
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system's burden. The present study aimed to assess the predictive accuracy of machine learning methods to predict the COVID-19 mortality risk. Methods We compared the performance of classification tree, random forest (RF), extreme gradient boosting (XGBoost), logistic regression, generalized additive model (GAM) and linear discriminant analysis (LDA) to predict the mortality risk among 49,216 COVID-19 positive cases in Toronto, Canada, reported from March 1 to December 10, 2020. We used repeated split-sample validation and k-steps-ahead forecasting validation. Predictive models were estimated using training samples, and predictive accuracy of the methods for the testing samples was assessed using the area under the receiver operating characteristic curve, Brier's score, calibration intercept and calibration slope. Results We found XGBoost is highly discriminative, with an AUC of 0.9669 and has superior performance over conventional tree-based methods, i.e., classification tree or RF methods for predicting COVID-19 mortality risk. Regression-based methods (logistic, GAM and LASSO) had comparable performance to the XGBoost with slightly lower AUCs and higher Brier's scores. Conclusions XGBoost offers superior performance over conventional tree-based methods and minor improvement over regression-based methods for predicting COVID-19 mortality risk in the study population.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Critical assessment of regression-based machine learning methods for polymer dielectrics
    Mannodi-Kanakkithodi, Arun
    Pilania, Ghanshyam
    Ramprasad, Rampi
    COMPUTATIONAL MATERIALS SCIENCE, 2016, 125 : 123 - 135
  • [22] Predicting musculoskeletal disorders risk using tree-based ensemble methods
    Paraponaris, A.
    Ba, A.
    Gallic, E.
    Liance, Q.
    Michel, Pierre
    EUROPEAN JOURNAL OF PUBLIC HEALTH, 2019, 29
  • [23] A comparison of machine learning algorithms in predicting COVID-19 prognostics
    Ustebay, Serpil
    Sarmis, Abdurrahman
    Kaya, Gulsum Kubra
    Sujan, Mark
    INTERNAL AND EMERGENCY MEDICINE, 2023, 18 (01) : 229 - 239
  • [24] A comparison of machine learning algorithms in predicting COVID-19 prognostics
    Serpil Ustebay
    Abdurrahman Sarmis
    Gulsum Kubra Kaya
    Mark Sujan
    Internal and Emergency Medicine, 2023, 18 : 229 - 239
  • [25] Advanced tree-based machine learning methods for predicting the seismic response of regular and irregular RC frames
    Demir, Ahmet
    Sahin, Emrehan Kutlug
    Demir, Selcuk
    STRUCTURES, 2024, 64
  • [26] Predicting Risk of Mortality in COVID-19 Hospitalized Patients using Hybrid Machine Learning Algorithms
    Afrash M.R.
    Shanbehzadeh M.
    Kazemi-Arpanahi H.
    Journal of Biomedical Physics and Engineering, 2022, 12 (06): : 611 - 626
  • [27] An Early Warning Tool for Predicting Mortality Risk of COVID-19 Patients Using Machine Learning
    Chowdhury, Muhammad E. H.
    Rahman, Tawsifur
    Khandakar, Amith
    Al-Madeed, Somaya
    Zughaier, Susu M.
    Doi, Suhail A. R.
    Hassen, Hanadi
    Islam, Mohammad T.
    COGNITIVE COMPUTATION, 2024, 16 (04) : 1778 - 1793
  • [29] A risk score based on baseline risk factors for predicting mortality in COVID-19 patients
    Chen, Ze
    Chen, Jing
    Zhou, Jianghua
    Lei, Fang
    Zhou, Feng
    Qin, Juan-Juan
    Zhang, Xiao-Jing
    Zhu, Lihua
    Liu, Ye-Mao
    Wang, Haitao
    Chen, Ming-Ming
    Zhao, Yan-Ci
    Xie, Jing
    Shen, Lijun
    Song, Xiaohui
    Zhang, Xingyuan
    Yang, Chengzhang
    Liu, Weifang
    Zhang, Xiao
    Guo, Deliang
    Yang, Youqin
    Liu, Mingyu
    Mao, Weiming
    Liu, Liming
    Ye, Ping
    Xiao, Bing
    Luo, Pengcheng
    Zhang, Zixiong
    Lu, Zhigang
    Wang, Junhai
    Lu, Haofeng
    Xia, Xigang
    Wang, Daihong
    Liao, Xiaofeng
    Peng, Gang
    Liang, Liang
    Yang, Jun
    Chen, Guohua
    Azzolini, Elena
    Aghemo, Alessio
    Ciccarelli, Michele
    Condorelli, Gianluigi
    Stefanini, Giulio G.
    Wei, Xiang
    Zhang, Bing-Hong
    Huang, Xiaodong
    Xia, Jiahong
    Yuan, Yufeng
    She, Zhi-Gang
    Guo, Jiao
    CURRENT MEDICAL RESEARCH AND OPINION, 2021, 37 (06) : 917 - 927
  • [30] Comparison of Tree-Based Methods for Multi-target Regression on Data Streams
    Osojnik, Aljaz
    Panov, Pance
    Dzeroski, Saso
    NEW FRONTIERS IN MINING COMPLEX PATTERNS, 2016, 9607 : 17 - 31