Improving prediction of groundwater quality in situations of limited monitoring data based on virtual sample generation and Gaussian process regression

被引:3
|
作者
Zhang, Jiang [1 ,2 ,3 ,4 ]
Xiao, Changlai [1 ,2 ,3 ,4 ]
Yang, Weifei [1 ,2 ,3 ,4 ]
Liang, Xiujuan [1 ,2 ,3 ,4 ]
Zhang, Linzuo [1 ,2 ,3 ,4 ]
Wang, Xinkang [1 ,2 ,3 ,4 ]
Dai, Rongkun [1 ,2 ,3 ,4 ]
机构
[1] Jilin Univ, Key Lab Groundwater Resources & Environm, Minist Educ, Changchun 130021, Peoples R China
[2] Jilin Univ, Jilin Prov Key Lab Water Resources & Environm, Changchun 130021, Peoples R China
[3] Jilin Univ, Coll New Energy & Environm, Changchun 130021, Peoples R China
[4] Natl Local Joint Engn Lab In Situ Convers, Drilling & Exploitat Technol Oil Shale, Changchun 130021, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
VSG; MD-MTD; Generative adversarial network; t -distributed stochastic neighbor embedding; GPR; Strontium in groundwater; NITRATE CONCENTRATION; TREND-DIFFUSION; MODEL; PERFORMANCE; MANIFOLD;
D O I
10.1016/j.watres.2024.122498
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The increasing pollution of aquifers by human activities over recent decades poses a threat to drinking water safety. While Gaussian Process Regression (GPR) is a robust tool for predicting and monitoring water quality, its effectiveness is hindered limitations of available data on model training and validation, known as the "small sample problem". Various attempts to resolve this problem include virtual sample generation (VSG). This study aimed to increase the accuracy of GPR for predicting water quality in situations of limited datasets. Three VSG methods, namely Multi Distribution Mega-Trend Diffusion (MD-MTD), Generative Adversarial Network (GAN), and t-distributed stochastic nearest neighbor embedding (t-SNE) were compared for enhancing the accuracy of GPR model prediction of Strontium (Sr2+). The models were used to predict Sr2+ in the shallow aquifer system in Songyuan, Jilin Province. The results showed that t-SNE provided the most significant improvement to the accuracy of the GPR, with R-2 increasing from 0.86 to 0.99 (12.98 %), followed by MD-MTD (R-2 of 0.95, 9.39 %), with the least improvement obtained by GAN (R-2 of 0.92, 5.98 %). Boxplots show that MD-MTD-GPR predictions do not fully capture observed data distributions. GANs accurately replicate the data distribution, while t-SNE-GPR achieves the highest prediction accuracy and handles data fluctuations. GPR accuracy improves with an increasing number of virtual samples but tends to decrease when the number exceeds 258 in this study. This study can guide the improvement of the accuracy of GPR for situations of limited datasets. The results of this study can help improve water quality management and drinking water safety in regions with sparse monitoring data.
引用
收藏
页数:17
相关论文
共 35 条
  • [21] Variable selection using Gaussian process regression-based metrics for high-dimensional model approximation with limited data
    Lee, Kyungeun
    Cho, Hyunkyoo
    Lee, Ikjin
    STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION, 2019, 59 (05) : 1439 - 1454
  • [22] Variable selection using Gaussian process regression-based metrics for high-dimensional model approximation with limited data
    Kyungeun Lee
    Hyunkyoo Cho
    Ikjin Lee
    Structural and Multidisciplinary Optimization, 2019, 59 : 1439 - 1454
  • [23] A framework based on multivariate distribution-based virtual sample generation and DNN for predicting water quality with small data
    El Bilali, Ali
    Lamane, Houda
    Taleb, Abdeslam
    Nafii, Ayoub
    JOURNAL OF CLEANER PRODUCTION, 2022, 368
  • [24] Application of Gaussian Process Regression for Data Efficient Prediction of PCB-based Power Delivery Network Impedance Features
    Hassab, Youcef
    Schierholz, Morten
    Schuster, Christian
    2024 IEEE 28TH WORKSHOP ON SIGNAL AND POWER INTEGRITY, SPI 2024, 2024,
  • [25] A Learning-based Video Compression on Low-Quality Data by Unscented Kalman Filters with Gaussian Process Regression
    Xiong, Hongkai
    Yuan, Zhe
    Zheng, Yuan F.
    2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2011, : 1227 - 1230
  • [26] Application of cutting power consumption in tool condition monitoring and wear prediction based on Gaussian process regression under variable cutting parameters
    Qiang, Biyao
    Shi, Kaining
    Liu, Ning
    Zhao, Pan
    Ren, Junxue
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2023, 124 (1-2): : 37 - 50
  • [27] Application of cutting power consumption in tool condition monitoring and wear prediction based on Gaussian process regression under variable cutting parameters
    Biyao Qiang
    Kaining Shi
    Ning Liu
    Pan Zhao
    Junxue Ren
    The International Journal of Advanced Manufacturing Technology, 2023, 124 : 37 - 50
  • [28] Online quality prediction of nonlinear and non-Gaussian chemical processes with shifting dynamics using finite mixture model based Gaussian process regression approach
    Yu, Jie
    CHEMICAL ENGINEERING SCIENCE, 2012, 82 : 22 - 30
  • [29] Water quality prediction model using Gaussian process regression based on deep learning for carbon neutrality in papermaking wastewater treatment system
    Wan, Xin
    Li, Xiaoyong
    Wang, Xinzhi
    Yi, Xiaohui
    Zhao, Yinzhong
    He, Xinzhong
    Wu, Renren
    Huang, Mingzhi
    ENVIRONMENTAL RESEARCH, 2022, 211
  • [30] Multi-Kernel Gaussian Process Regression and Bayesian Model Averaging Based Nonlinear State Estimation and Quality Prediction of Multiphase Batch Processes
    Yu, Jie
    Chen, Kuilin
    Mori, Junichi
    Rashid, Mudassir M.
    2013 AMERICAN CONTROL CONFERENCE (ACC), 2013, : 5451 - 5456