A comparison of machine learning algorithms and traditional regression-based statistical modeling for predicting hypertension incidence in a Canadian population

被引:17
|
作者
Chowdhury, Mohammad Ziaul Islam [1 ,2 ,3 ]
Leung, Alexander A. A. [1 ,4 ]
Walker, Robin L. L. [1 ,5 ]
Sikdar, Khokan C. C. [6 ]
O'Beirne, Maeve [2 ]
Quan, Hude [1 ]
Turin, Tanvir C. C. [1 ,2 ]
机构
[1] Univ Calgary, Dept Community Hlth Sci, 3280 Hosp Drive NW, Calgary, AB T2N 4Z6, Canada
[2] Univ Calgary, Dept Family Med, 3330 Hosp Drive NW, Calgary, AB T2N 4N1, Canada
[3] Univ Calgary, Dept Psychiat, 3280 Hosp Drive NW, Calgary, AB T2N 4Z6, Canada
[4] Univ Calgary, Dept Med, 3280 Hosp Drive NW, Calgary, AB T2N 4Z6, Canada
[5] Alberta Hlth Serv, Primary Hlth Care Integrat Network, Primary Hlth Care, Calgary, AB, Canada
[6] Alberta Hlth Serv, Hlth Status Assessment Surveillance & Reporting, Publ Hlth Surveillance & Infrastructure, Prov Populat & Publ Hlth, 10101 Southport Rd SW, Calgary, AB T2W 3N2, Canada
关键词
RISK PREDICTION; IMPUTATION; HEALTH; AGE;
D O I
10.1038/s41598-022-27264-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Risk prediction models are frequently used to identify individuals at risk of developing hypertension. This study evaluates different machine learning algorithms and compares their predictive performance with the conventional Cox proportional hazards (PH) model to predict hypertension incidence using survival data. This study analyzed 18,322 participants on 24 candidate features from the large Alberta's Tomorrow Project (ATP) to develop different prediction models. To select the top features, we applied five feature selection methods, including two filter-based: a univariate Cox p-value and C-index; two embedded-based: random survival forest and least absolute shrinkage and selection operator (Lasso); and one constraint-based: the statistically equivalent signature (SES). Five machine learning algorithms were developed to predict hypertension incidence: penalized regression Ridge, Lasso, Elastic Net (EN), random survival forest (RSF), and gradient boosting (GB), along with the conventional Cox PH model. The predictive performance of the models was assessed using C-index. The performance of machine learning algorithms was observed, similar to the conventional Cox PH model. Average C-indexes were 0.78, 0.78, 0.78, 0.76, 0.76, and 0.77 for Ridge, Lasso, EN, RSF, GB and Cox PH, respectively. Important features associated with each model were also presented. Our study findings demonstrate little predictive performance difference between machine learning algorithms and the conventional Cox PH regression model in predicting hypertension incidence. In a moderate dataset with a reasonable number of features, conventional regression-based models perform similar to machine learning algorithms with good predictive accuracy.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] A comparison of machine learning algorithms and traditional regression-based statistical modeling for predicting hypertension incidence in a Canadian population
    Mohammad Ziaul Islam Chowdhury
    Alexander A. Leung
    Robin L. Walker
    Khokan C. Sikdar
    Maeve O’Beirne
    Hude Quan
    Tanvir C. Turin
    Scientific Reports, 13
  • [2] Mapping wind erosion hazard with regression-based machine learning algorithms
    Hamid Gholami
    Aliakbar Mohammadifar
    Dieu Tien Bui
    Adrian L. Collins
    Scientific Reports, 10
  • [3] Mapping wind erosion hazard with regression-based machine learning algorithms
    Gholami, Hamid
    Mohammadifar, Aliakbar
    Bui, Dieu Tien
    Collins, Adrian L.
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [4] Comparison of deep learning and regression-based MPPT algorithms in PV systems
    Karabinaoglu, Murat Salim
    Cakir, Bekir
    Basoglu, Mustafa Engin
    Kazdaloglu, Abdulvehhab
    Guneroglu, Aziz
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2022, 30 (06) : 2319 - 2338
  • [5] Comparison of machine learning and the regression-based EHMRG model for predicting early mortality in acute heart failure
    Austin, David E.
    Lee, Douglas S.
    Wang, Chloe X.
    Ma, Shihao
    Wang, Xuesong
    Porter, Joan
    Wang, Bo
    INTERNATIONAL JOURNAL OF CARDIOLOGY, 2022, 365 : 78 - 84
  • [6] Distribution-free risk assessment of regression-based machine learning algorithms
    Singh, Sukrita
    Sarna, Neeraj
    Li, Yuanyuan
    Lin, Yang
    Orfanoudaki, Agni
    Berger, Michael
    13TH SYMPOSIUM ON CONFORMAL AND PROBABILISTIC PREDICTION WITH APPLICATIONS, 2024, 230 : 44 - 64
  • [7] NONPARAMETRIC STATISTICAL ANALYSIS FOR MULTIPLE COMPARISON OF MACHINE LEARNING REGRESSION ALGORITHMS
    Trawinski, Bogdan
    Smetek, Magdalena
    Telec, Zbigniew
    Lasota, Tadeusz
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2012, 22 (04) : 867 - 881
  • [8] Predicting fracture risk: a comparison of Deep Learning algorithms and traditional statistical modelling
    Dinh Tan Nguyen
    Thao Phuong Ho-Le
    Thach Son Tran
    Nguyen, Tuan V.
    JOURNAL OF BONE AND MINERAL RESEARCH, 2024, 39 : 298 - 298
  • [9] MODELLING SHIPS MAIN AND AUXILIARY ENGINE POWERS WITH REGRESSION-BASED MACHINE LEARNING ALGORITHMS
    Okumus, Fatih
    Ekmekcioglu, Araks
    Kara, Selin Soner
    POLISH MARITIME RESEARCH, 2021, 28 (01) : 83 - 96
  • [10] Probe into the volumetric properties of binary mixtures: Essence of regression-based machine learning algorithms
    Sharma, Anshu
    Li, Li
    Garg, Aman
    Lee, Bong Seop
    JOURNAL OF MOLECULAR LIQUIDS, 2024, 399