Leptospirosis modelling using hydrometeorological indices and random forest machine learning

被引:0
|
作者
Veianthan Jayaramu
Zed Zulkafli
Simon De Stercke
Wouter Buytaert
Fariq Rahmat
Ribhan Zafira Abdul Rahman
Asnor Juraiza Ishak
Wardah Tahir
Jamalludin Ab Rahman
Nik Mohd Hafiz Mohd Fuzi
机构
[1] Universiti Putra Malaysia,Department of Civil Engineering
[2] Imperial College London,Department of Civil and Environmental Engineering
[3] Universiti Putra Malaysia,Department of Electrical and Electronic Engineering
[4] Universiti Teknologi Mara,Flood Control Research Group, Faculty of Civil Engineering
[5] International Islamic University Malaysia,Department of Community Medicine, Kulliyyah of Medicine
[6] Ministry of Health Malaysia,Kelantan State Health Department
来源
International Journal of Biometeorology | 2023年 / 67卷
关键词
Leptospirosis; Hydrometeorological indices; Cross-correlation analysis; Random forest; Variable importance; Feature selection;
D O I
暂无
中图分类号
学科分类号
摘要
Leptospirosis is a zoonosis that has been linked to hydrometeorological variability. Hydrometeorological averages and extremes have been used before as drivers in the statistical prediction of disease. However, their importance and predictive capacity are still little known. In this study, the use of a random forest classifier was explored to analyze the relative importance of hydrometeorological indices in developing the leptospirosis model and to evaluate the performance of models based on the type of indices used, using case data from three districts in Kelantan, Malaysia, that experience annual monsoonal rainfall and flooding. First, hydrometeorological data including rainfall, streamflow, water level, relative humidity, and temperature were transformed into 164 weekly average and extreme indices in accordance with the Expert Team on Climate Change Detection and Indices (ETCCDI). Then, weekly case occurrences were classified into binary classes “high” and “low” based on an average threshold. Seventeen models based on “average,” “extreme,” and “mixed” indices were trained by optimizing the feature subsets based on the model computed mean decrease Gini (MDG) scores. The variable importance was assessed through cross-correlation analysis and the MDG score. The average and extreme models showed similar prediction accuracy ranges (61.5–76.1% and 72.3–77.0%) while the mixed models showed an improvement (71.7–82.6% prediction accuracy). An extreme model was the most sensitive while an average model was the most specific. The time lag associated with the driving indices agreed with the seasonality of the monsoon. The rainfall variable (extreme) was the most important in classifying the leptospirosis occurrence while streamflow was the least important despite showing higher correlations with leptospirosis.
引用
收藏
页码:423 / 437
页数:14
相关论文
共 50 条
  • [41] Machine Learning Random Forest Cluster Analysis for Large Overfitting Data: using R Programming
    Rimal, Yagyanath
    PROCEEDINGS OF THE 2019 6TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2019, : 1265 - 1271
  • [42] Network Intrusion Detection System Using Random Forest and Decision Tree Machine Learning Techniques
    Bhavani, T. Tulasi
    Rao, M. Kameswara
    Reddy, A. Manohar
    FIRST INTERNATIONAL CONFERENCE ON SUSTAINABLE TECHNOLOGIES FOR COMPUTATIONAL INTELLIGENCE, 2020, 1045 : 637 - 643
  • [43] DIRECT ESTIMATION OF ECOSYSTEM WATER USE EFFICIENCY USING THE RANDOM FOREST MACHINE LEARNING MODEL
    Sun, Yifei
    Huang, Lingxiao
    Wang, Junrui
    Liu, Meng
    Di, Suchuang
    Yang, Simin
    Zhang, Hang
    Zhang, Cen
    Tang, Ronglin
    2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2024), 2024, : 10550 - 10553
  • [44] National classification of surface-groundwater interaction using random forest machine learning technique
    Yang, Jing
    Griffiths, James
    Zammit, Christian
    RIVER RESEARCH AND APPLICATIONS, 2019, 35 (07) : 932 - 943
  • [45] Using machine learning for assigning indices to textual cases
    Bruninghaus, S
    Ashley, KD
    CASE-BASED REASONING RESEARCH AND DEVELOPMENT, 1997, 1266 : 303 - 314
  • [46] Modelling bluetongue and African horse sickness vector (Culicoides spp.) distribution in the Western Cape in South Africa using random forest machine learning
    de Klerk, Joanna
    Tildesley, Michael
    Labuschagne, Karien
    Gorsich, Erin
    PARASITES & VECTORS, 2024, 17 (01):
  • [47] Machine learning random forest for predicting oncosomatic variant NGS analysis
    Pellegrino, Eric
    Jacques, Coralie
    Beaufils, Nathalie
    Nanni, Isabelle
    Carlioz, Antoine
    Metellus, Philippe
    Ouafik, L'Houcine
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [48] Probabilistic Random Forest: A Machine Learning Algorithm for Noisy Data Sets
    Reis, Itamar
    Baron, Dalya
    Shahaf, Sahar
    ASTRONOMICAL JOURNAL, 2019, 157 (01):
  • [49] A Random Forest Machine Learning Approach for the Identification and Quantification of Erosive Events
    Vergni, Lorenzo
    Todisco, Francesca
    WATER, 2023, 15 (12)
  • [50] Machine learning model for random forest acute oral toxicity prediction
    Elsayad, A. M.
    Elsayad, K. A.
    Zeghid, M.
    Khan, A. N.
    Baareh, A. K. M.
    Sadiq, A.
    Mukhtar, S. A.
    Ali, H. F.
    Abd El-kade, S.
    GLOBAL JOURNAL OF ENVIRONMENTAL SCIENCE AND MANAGEMENT-GJESM, 2025, 11 (01): : 21 - 38