Improving prediction of groundwater quality in situations of limited monitoring data based on virtual sample generation and Gaussian process regression

被引:3
|
作者
Zhang, Jiang [1 ,2 ,3 ,4 ]
Xiao, Changlai [1 ,2 ,3 ,4 ]
Yang, Weifei [1 ,2 ,3 ,4 ]
Liang, Xiujuan [1 ,2 ,3 ,4 ]
Zhang, Linzuo [1 ,2 ,3 ,4 ]
Wang, Xinkang [1 ,2 ,3 ,4 ]
Dai, Rongkun [1 ,2 ,3 ,4 ]
机构
[1] Jilin Univ, Key Lab Groundwater Resources & Environm, Minist Educ, Changchun 130021, Peoples R China
[2] Jilin Univ, Jilin Prov Key Lab Water Resources & Environm, Changchun 130021, Peoples R China
[3] Jilin Univ, Coll New Energy & Environm, Changchun 130021, Peoples R China
[4] Natl Local Joint Engn Lab In Situ Convers, Drilling & Exploitat Technol Oil Shale, Changchun 130021, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
VSG; MD-MTD; Generative adversarial network; t -distributed stochastic neighbor embedding; GPR; Strontium in groundwater; NITRATE CONCENTRATION; TREND-DIFFUSION; MODEL; PERFORMANCE; MANIFOLD;
D O I
10.1016/j.watres.2024.122498
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The increasing pollution of aquifers by human activities over recent decades poses a threat to drinking water safety. While Gaussian Process Regression (GPR) is a robust tool for predicting and monitoring water quality, its effectiveness is hindered limitations of available data on model training and validation, known as the "small sample problem". Various attempts to resolve this problem include virtual sample generation (VSG). This study aimed to increase the accuracy of GPR for predicting water quality in situations of limited datasets. Three VSG methods, namely Multi Distribution Mega-Trend Diffusion (MD-MTD), Generative Adversarial Network (GAN), and t-distributed stochastic nearest neighbor embedding (t-SNE) were compared for enhancing the accuracy of GPR model prediction of Strontium (Sr2+). The models were used to predict Sr2+ in the shallow aquifer system in Songyuan, Jilin Province. The results showed that t-SNE provided the most significant improvement to the accuracy of the GPR, with R-2 increasing from 0.86 to 0.99 (12.98 %), followed by MD-MTD (R-2 of 0.95, 9.39 %), with the least improvement obtained by GAN (R-2 of 0.92, 5.98 %). Boxplots show that MD-MTD-GPR predictions do not fully capture observed data distributions. GANs accurately replicate the data distribution, while t-SNE-GPR achieves the highest prediction accuracy and handles data fluctuations. GPR accuracy improves with an increasing number of virtual samples but tends to decrease when the number exceeds 258 in this study. This study can guide the improvement of the accuracy of GPR for situations of limited datasets. The results of this study can help improve water quality management and drinking water safety in regions with sparse monitoring data.
引用
收藏
页数:17
相关论文
共 35 条
  • [1] Evaluation of Gaussian process regression kernel functions for improving groundwater prediction
    Pan, Yue
    Zeng, Xiankui
    Xu, Hongxia
    Sun, Yuanyuan
    Wang, Dong
    Wu, Jichun
    JOURNAL OF HYDROLOGY, 2021, 603
  • [2] Evaluation of Gaussian process regression kernel functions for improving groundwater prediction
    Pan, Yue
    Zeng, Xiankui
    Xu, Hongxia
    Sun, Yuanyuan
    Wang, Dong
    Wu, Jichun
    Journal of Hydrology, 2021, 603
  • [3] Anomaly Detection based on Data Stream Monitoring and Prediction with Improved Gaussian Process Regression Algorithm
    Pang, Jingyue
    Liu, Datong
    Liao, Haitao
    Peng, Yu
    Peng, Xiyuan
    2014 IEEE CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT (PHM), 2014,
  • [4] Gaussian Process Regression for a PMV Prediction Model using Environmental Monitoring Data
    Yoon, Young Ran
    Moon, Hyeun Jun
    Kim, Sun Ho
    Kim, Jeong Won
    PROCEEDINGS OF BUILDING SIMULATION 2019: 16TH CONFERENCE OF IBPSA, 2020, : 2540 - 2545
  • [5] Adaptive Bandwidth Allocation Based on Sample Path Prediction With Gaussian Process Regression
    Kim, Jeongseop
    Hwang, Ganguk
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2019, 18 (10) : 4983 - 4996
  • [6] The Prediction Method on the Early Failure of Hydropower Units Based on Gaussian Process Regression Driven by Monitoring Data
    Huang, Huade
    Qin, Aisong
    Mao, Hanling
    Fu, Jiahe
    Huang, Zhenfeng
    Yang, Yi
    Li, Xinxin
    Huang, He
    APPLIED SCIENCES-BASEL, 2021, 11 (01): : 1 - 20
  • [7] Robust Remaining Useful Lifetime Prediction for Lithium-Ion Batteries With Dual Gaussian Process Regression-Based Ensemble Strategies on Limited Sample Data
    Li, Xingjun
    Yu, Dan
    Vilsen, Soren Byg
    Subramanian, Venkat R.
    Stroe, Daniel-Ioan
    IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, 2025, 11 (02): : 6279 - 6290
  • [8] Fine-Grained Air Quality Monitoring Based on Gaussian Process Regression
    Cheng, Yun
    Li, Xiucheng
    Li, Zhijun
    Jiang, Shouxu
    Jiang, Xiaofan
    NEURAL INFORMATION PROCESSING (ICONIP 2014), PT II, 2014, 8835 : 126 - 134
  • [9] Prediction of mechanical properties of LPBF built part based on process monitoring and Gaussian process regression
    Yuan, Zhenghui
    Peng, Xiaojun
    Ma, ChenGuang
    Zhang, Aoming
    Chen, Zhangdong
    Jiang, Zimeng
    Zhang, Yingjie
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (08)
  • [10] Multiscale Gaussian Process Regression-Based GLRT for Water Quality Monitoring
    Fazai, Radhia
    Mansouri, Majdi
    Abodayeh, Kamal
    Puig, Vicenc
    Selmi, Mohamed
    Nounou, Hazem
    Nounou, Mohamed
    2019 4TH CONFERENCE ON CONTROL AND FAULT TOLERANT SYSTEMS (SYSTOL), 2019, : 44 - 49