Improving prediction of groundwater quality in situations of limited monitoring data based on virtual sample generation and Gaussian process regression

被引:3
|
作者
Zhang, Jiang [1 ,2 ,3 ,4 ]
Xiao, Changlai [1 ,2 ,3 ,4 ]
Yang, Weifei [1 ,2 ,3 ,4 ]
Liang, Xiujuan [1 ,2 ,3 ,4 ]
Zhang, Linzuo [1 ,2 ,3 ,4 ]
Wang, Xinkang [1 ,2 ,3 ,4 ]
Dai, Rongkun [1 ,2 ,3 ,4 ]
机构
[1] Jilin Univ, Key Lab Groundwater Resources & Environm, Minist Educ, Changchun 130021, Peoples R China
[2] Jilin Univ, Jilin Prov Key Lab Water Resources & Environm, Changchun 130021, Peoples R China
[3] Jilin Univ, Coll New Energy & Environm, Changchun 130021, Peoples R China
[4] Natl Local Joint Engn Lab In Situ Convers, Drilling & Exploitat Technol Oil Shale, Changchun 130021, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
VSG; MD-MTD; Generative adversarial network; t -distributed stochastic neighbor embedding; GPR; Strontium in groundwater; NITRATE CONCENTRATION; TREND-DIFFUSION; MODEL; PERFORMANCE; MANIFOLD;
D O I
10.1016/j.watres.2024.122498
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The increasing pollution of aquifers by human activities over recent decades poses a threat to drinking water safety. While Gaussian Process Regression (GPR) is a robust tool for predicting and monitoring water quality, its effectiveness is hindered limitations of available data on model training and validation, known as the "small sample problem". Various attempts to resolve this problem include virtual sample generation (VSG). This study aimed to increase the accuracy of GPR for predicting water quality in situations of limited datasets. Three VSG methods, namely Multi Distribution Mega-Trend Diffusion (MD-MTD), Generative Adversarial Network (GAN), and t-distributed stochastic nearest neighbor embedding (t-SNE) were compared for enhancing the accuracy of GPR model prediction of Strontium (Sr2+). The models were used to predict Sr2+ in the shallow aquifer system in Songyuan, Jilin Province. The results showed that t-SNE provided the most significant improvement to the accuracy of the GPR, with R-2 increasing from 0.86 to 0.99 (12.98 %), followed by MD-MTD (R-2 of 0.95, 9.39 %), with the least improvement obtained by GAN (R-2 of 0.92, 5.98 %). Boxplots show that MD-MTD-GPR predictions do not fully capture observed data distributions. GANs accurately replicate the data distribution, while t-SNE-GPR achieves the highest prediction accuracy and handles data fluctuations. GPR accuracy improves with an increasing number of virtual samples but tends to decrease when the number exceeds 258 in this study. This study can guide the improvement of the accuracy of GPR for situations of limited datasets. The results of this study can help improve water quality management and drinking water safety in regions with sparse monitoring data.
引用
收藏
页数:17
相关论文
共 35 条
  • [31] A Monte Carlo and PSO based virtual sample generation method for enhancing the energy prediction and energy optimization on small data problem: An empirical study of petrochemical industries
    Gong, Hong-Fei
    Chen, Zhong-Sheng
    Zhu, Qun-Xiong
    He, Yan-Lin
    APPLIED ENERGY, 2017, 197 : 405 - 415
  • [32] A Bayesian model averaging based multi-kernel Gaussian process regression framework for nonlinear state estimation and quality prediction of multiphase batch processes with transient dynamics and uncertainty
    Yu, Jie
    Chen, Kuilin
    Rashid, Mudassir M.
    CHEMICAL ENGINEERING SCIENCE, 2013, 93 : 96 - 109
  • [33] Optimal Integration of Optical and SAR Data for Improving Alfalfa Yield and Quality Traits Prediction: New Insights into Satellite-Based Forage Crop Monitoring
    Chen, Jiang
    Yu, Tong
    Cherney, Jerome H.
    Zhang, Zhou
    REMOTE SENSING, 2024, 16 (05)
  • [34] Drying characteristics of thermally pre-treated Cobra 26 F1 tomato slabs and applicability of Gaussian process regression-based models for the prediction of experimental kinetic data
    Oladayo Adeyi
    Emmanuel Olusola Oke
    Abiola John Adeyi
    Bernard Iberzim Okolo
    Abayomi Olusegun Olalere
    John Adebayo Otolorin
    Ayomide Adeola
    Brown Dagogo
    Akinola David Ogunsola
    Sunday Oladunni
    Korean Journal of Chemical Engineering, 2022, 39 : 1135 - 1145
  • [35] Drying characteristics of thermally pre-treated Cobra 26 F1 tomato slabs and applicability of Gaussian process regression-based models for the prediction of experimental kinetic data
    Adeyi, Oladayo
    Oke, Emmanuel Olusola
    Adeyi, Abiola John
    Okolo, Bernard Iberzim
    Olalere, Abayomi Olusegun
    Otolorin, John Adebayo
    Adeola, Ayomide
    Dagogo, Brown
    Ogunsola, Akinola David
    Oladunni, Sunday
    KOREAN JOURNAL OF CHEMICAL ENGINEERING, 2022, 39 (05) : 1135 - 1145