Improving prediction of groundwater quality in situations of limited monitoring data based on virtual sample generation and Gaussian process regression

被引:3
|
作者
Zhang, Jiang [1 ,2 ,3 ,4 ]
Xiao, Changlai [1 ,2 ,3 ,4 ]
Yang, Weifei [1 ,2 ,3 ,4 ]
Liang, Xiujuan [1 ,2 ,3 ,4 ]
Zhang, Linzuo [1 ,2 ,3 ,4 ]
Wang, Xinkang [1 ,2 ,3 ,4 ]
Dai, Rongkun [1 ,2 ,3 ,4 ]
机构
[1] Jilin Univ, Key Lab Groundwater Resources & Environm, Minist Educ, Changchun 130021, Peoples R China
[2] Jilin Univ, Jilin Prov Key Lab Water Resources & Environm, Changchun 130021, Peoples R China
[3] Jilin Univ, Coll New Energy & Environm, Changchun 130021, Peoples R China
[4] Natl Local Joint Engn Lab In Situ Convers, Drilling & Exploitat Technol Oil Shale, Changchun 130021, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
VSG; MD-MTD; Generative adversarial network; t -distributed stochastic neighbor embedding; GPR; Strontium in groundwater; NITRATE CONCENTRATION; TREND-DIFFUSION; MODEL; PERFORMANCE; MANIFOLD;
D O I
10.1016/j.watres.2024.122498
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The increasing pollution of aquifers by human activities over recent decades poses a threat to drinking water safety. While Gaussian Process Regression (GPR) is a robust tool for predicting and monitoring water quality, its effectiveness is hindered limitations of available data on model training and validation, known as the "small sample problem". Various attempts to resolve this problem include virtual sample generation (VSG). This study aimed to increase the accuracy of GPR for predicting water quality in situations of limited datasets. Three VSG methods, namely Multi Distribution Mega-Trend Diffusion (MD-MTD), Generative Adversarial Network (GAN), and t-distributed stochastic nearest neighbor embedding (t-SNE) were compared for enhancing the accuracy of GPR model prediction of Strontium (Sr2+). The models were used to predict Sr2+ in the shallow aquifer system in Songyuan, Jilin Province. The results showed that t-SNE provided the most significant improvement to the accuracy of the GPR, with R-2 increasing from 0.86 to 0.99 (12.98 %), followed by MD-MTD (R-2 of 0.95, 9.39 %), with the least improvement obtained by GAN (R-2 of 0.92, 5.98 %). Boxplots show that MD-MTD-GPR predictions do not fully capture observed data distributions. GANs accurately replicate the data distribution, while t-SNE-GPR achieves the highest prediction accuracy and handles data fluctuations. GPR accuracy improves with an increasing number of virtual samples but tends to decrease when the number exceeds 258 in this study. This study can guide the improvement of the accuracy of GPR for situations of limited datasets. The results of this study can help improve water quality management and drinking water safety in regions with sparse monitoring data.
引用
收藏
页数:17
相关论文
共 35 条
  • [11] An Automated Approach to Groundwater Quality Monitoring-Geospatial Mapping Based on Combined Application of Gaussian Process Regression and Bayesian Information Criterion
    Shadrin, Dmitrii
    Nikitin, Artyom
    Tregubova, Polina
    Terekhova, Vera
    Jana, Raghavendra
    Matveev, Sergey
    Pukalchik, Maria
    WATER, 2021, 13 (04)
  • [12] Virtual Sample Generation Method Based on GAN for Process Data with Its Application
    Cui Canlin
    Tang Jian
    Xia Heng
    Wang Dandan
    Yu Gang
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 242 - 247
  • [13] A Gaussian Process Regression for Natural Gas Consumption Prediction Based on Time Series Data
    Laib, Oussama
    Khadir, Mohamed Tarek
    Mihaylova, Lyudmila
    2018 21ST INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2018, : 55 - 61
  • [14] A perturbation signal based data-driven Gaussian process regression model for in-process part quality prediction in robotic countersinking operations
    Leco, Mateo
    Kadirkamanathan, Visakan
    ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2021, 71 (71)
  • [15] Application of Gaussian Process Regression for Life Prediction of A Robot Based on RMS Torque Data of Axis Motors
    Son, Young Kap
    TRANSACTIONS OF THE KOREAN SOCIETY OF MECHANICAL ENGINEERS A, 2023, 47 (12) : 1013 - 1020
  • [16] Reference-based Virtual Metrology method with uncertainty evaluation for Material Removal Rate prediction based on Gaussian Process Regression
    Haoshu Cai
    Jianshe Feng
    Qibo Yang
    Fei Li
    Xiang Li
    Jay Lee
    The International Journal of Advanced Manufacturing Technology, 2021, 116 : 1199 - 1211
  • [17] Reference-based Virtual Metrology method with uncertainty evaluation for Material Removal Rate prediction based on Gaussian Process Regression
    Cai, Haoshu
    Feng, Jianshe
    Yang, Qibo
    Li, Fei
    Li, Xiang
    Lee, Jay
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2021, 116 (3-4): : 1199 - 1211
  • [18] Adaptive ranking based ensemble learning of Gaussian process regression models for quality-related variable prediction in process industries
    Liu, Yiqi
    Huang, Daoping
    Liu, Bin
    Feng, Qiang
    Cai, Baoping
    APPLIED SOFT COMPUTING, 2021, 101
  • [19] Deformation prediction model of large-span prestressed structure for health monitoring based on robust Gaussian process regression
    Fu, Wenwei
    Chen, Yi
    Luo, Yaozhi
    Wan, Hua-Ping
    Ma, Zhi
    Shen, Yanbin
    ENGINEERING STRUCTURES, 2024, 318
  • [20] A virtual sample generation method based on manifold learning and a generative adversarial network for soft sensor models with limited data
    Bai, Xinpeng
    Li, Shaojun
    JOURNAL OF THE TAIWAN INSTITUTE OF CHEMICAL ENGINEERS, 2023, 151