Predicting water quality variables using gradient boosting machine: global versus local explainability using SHapley Additive Explanations (SHAP)

被引:0
|
作者
Merabet, Khaled [1 ]
Di Nunno, Fabio [2 ]
Granata, Francesco [2 ]
Kim, Sungwon [3 ]
Adnan, Rana Muhammad [4 ,7 ]
Heddam, Salim [1 ]
Kisi, Ozgur [5 ,8 ]
Zounemat-Kermani, Mohammad [6 ]
机构
[1] Univ 20 Aout 1955, Fac Sci, Agron Dept, Hydraul Div, Route El Hadaik,BP 26, Skikda, Algeria
[2] Univ Cassino & Southern Lazio, Dept Civil & Mech Engn DICEM, Via Biasio, 43, I-03043 Cassino, Frosinone, Italy
[3] Dongyang Univ, Dept Railroad Construct & Safety Engn, Yeongju 36040, South Korea
[4] Guangzhou Univ, Coll Architecture & Urban Planning, Guangzhou 510006, Peoples R China
[5] IIia State Univ, Sch Technol, Dept Civil Engn, Tbilisi 0179, Georgia
[6] Shahid Bahonar Univ Kerman, Dept Civil Engn, Kerman, Iran
[7] Saveetha Inst Med & Tech Sci, Ctr global Hlth Res, Chennai 600001, India
[8] Korea Univ, Sch Civil Environm & Architectural Engn, Seoul 02841, South Korea
关键词
Modelling; Water quality; Chl-a; DO; TU; AdaBoost; Boosting models; SHAP; SHORT-TERM-MEMORY; DISSOLVED-OXYGEN; LEARNING-MODEL; XGBOOST; RIVER; FRAMEWORK;
D O I
10.1007/s12145-025-01796-y
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Water quality assessment is critical for ensuring the health of aquatic ecosystems and managing water resources effectively. However, accurately predicting key water quality variables remains challenging due to the complex interactions between environmental factors and anthropogenic influences. In the present investigation, a new modelling framework is proposed for better prediction of three water quality variables, namely: (i) dissolved oxygen concentration (DO), (ii) water turbidity (TU), and (iii) water Chlorophyll a (Chl-a). Six machine learning models, i.e., adaptive boosting (AdaBoost), categorical boosting (CatBoost), histogram gradient boosting (HistGBRT), light gradient boosting machine (LightGBM), natural gradient boosting (NGBoost), and extreme gradient boosting (XGBoost), both applied and compared based on the combination of a large number of water quality variables. All models were developed using data collected from three stations: (i) USGS 05543010 Illinois River at Seneca, Illinois County, (ii) USGS 05586300 Illinois River at Florence, Illinois County, and (iii) USGS 05553700 Illinois River at Starved Rock, Illinois County, USA. The SHapley additive explanations (SHAP) was adopted in the present study for model interpretability and feature ranking. Furthermore, all models were compared using various numerical indices and graphical representations. From the obtained results we can draw the following conclusion. DO concentration can be predicted very well with high numerical performances, and the CatBoost model was found to be the best one exhibiting excellent numerical index: RMSE (0.430), MAE (0.326), R (0.980) and NSE (0.961), respectively. For Chl-a, all models were found to be less accurate and the best performances were obtained using the LightGBM with RMSE (5.916), MAE (4.294), R (0.892) and NSE (0.795), respectively. Finally, for water TU, none of the models were found to be accurate and very poor performances were obtained. Finally, the use of the SHAP has significantly helped in better understanding the overall contribution of the various water variables in the finale prediction.
引用
收藏
页数:34
相关论文
共 50 条
  • [31] Improved prediction of soil shear strength using machine learning algorithms: interpretability analysis using SHapley Additive exPlanations
    Ahmad, Mahmood
    Al Zubi, Mohammad
    Almujibah, Hamad
    Sabri, Mohanad Muayad Sabri
    Mustafvi, Jawad Bashir
    Haq, Shay
    Ouahbi, Tariq
    Alzlfawi, Abdullah
    FRONTIERS IN EARTH SCIENCE, 2025, 13
  • [32] An interpretable framework for modeling global solar radiation using tree-based ensemble machine learning and Shapley additive explanations methods
    Song, Zhe
    Cao, Sunliang
    Yang, Hongxing
    APPLIED ENERGY, 2024, 364
  • [33] Parametric Analysis for Torque Prediction in Friction Stir Welding Using Machine Learning and Shapley Additive Explanations
    Belalia, Sif Eddine
    Serier, Mohamed
    Al-Sabur, Raheem
    JOURNAL OF COMPUTATIONAL APPLIED MECHANICS, 2024, 55 (01): : 113 - 124
  • [34] Shapley-Additive-Explanations-Based Factor Analysis for Dengue Severity Prediction using Machine Learning
    Chowdhury, Shihab Uddin
    Sayeed, Sanjana
    Rashid, Iktisad
    Alam, Md Golam Rabiul
    Masum, Abdul Kadar Muhammad
    Dewan, M. Ali Akber
    JOURNAL OF IMAGING, 2022, 8 (09)
  • [35] Using a Light Gradient-Boosting Machine-Shapley Additive Explanations Model to Evaluate the Correlation Between Urban Blue-Green Space Landscape Spatial Patterns and Carbon Sequestration
    Wu, Yuting
    Luo, Mengya
    Ding, Shaogang
    Han, Qiyao
    LAND, 2024, 13 (11)
  • [36] Modeling CO2 solubility in water using gradient boosting and light gradient boosting machine
    Mahmoudzadeh, Atena
    Amiri-Ramsheh, Behnam
    Atashrouz, Saeid
    Abedi, Ali
    Abuswer, Meftah Ali
    Ostadhassan, Mehdi
    Mohaddespour, Ahmad
    Hemmati-Sarapardeh, Abdolhossein
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [37] Predicting Crop Yield using Long Short-Term Memory, Integrated Gradients and Shapley Additive Explanations
    Arumugam, S. S. L. Durai
    Kumar, R. Praveen
    Rubeshkumar, A.
    Rahul, S. Shanjai
    2024 4TH INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND SOCIAL NETWORKING, ICPCSN 2024, 2024, : 975 - 983
  • [38] Creating machine learning models that interpretably link systemic inflammatory index, sex steroid hormones, and dietary antioxidants to identify gout using the SHAP (SHapley Additive exPlanations) method
    Cao, Shunshun
    Hu, Yangyang
    FRONTIERS IN IMMUNOLOGY, 2024, 15
  • [39] Diabetes prediction using Shapley additive explanations and DSaaS over machine learning classifiers: a novel healthcare paradigm
    Guleria, Pratiyush
    Srinivasu, Parvathaneni Naga
    Hassaballah, M.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 40677 - 40712
  • [40] Prediction of electric vehicle charging duration time using ensemble machine learning algorithm and Shapley additive explanations
    Ullah, Irfan
    Liu, Kai
    Yamamoto, Toshiyuki
    Zahid, Muhammad
    Jamal, Arshad
    INTERNATIONAL JOURNAL OF ENERGY RESEARCH, 2022, 46 (11) : 15211 - 15230