Artificial Intelligence Generated Synthetic Datasets as the Remedy for Data Scarcity in Water Quality Index Estimation

被引:9
|
作者
Chia, Min Yan [1 ]
Koo, Chai Hoon [1 ]
Huang, Yuk Feng [1 ]
Di Chan, Wei [1 ]
Pang, Jia Yin [1 ]
机构
[1] Univ Tunku Abdul Rahman, Lee Kong Chian Fac Engn & Sci, Dept Civil Engn, Bandar Sungai Long, Selangor, Malaysia
关键词
synthetic data; artificial intelligence; back-propagation neural network; water quality index;
D O I
10.1007/s11269-023-03650-6
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Water quality index (WQI) has been utilised in many countries and regions as a numeric representation of the condition of water resources. However, the computation of the WQI involves a host of water quality variables. Although machine learning models are proven to be a promising tool to estimate WQI with lesser inputs, sufficient data or samples must be collected so that the machine learning models can be trained well. This exhibits a great challenge in places where there has been a lack of data collection infrastructure to meet the needs of machine learning models. Data scarcity is a major issue to be tackled. This study covered two major rivers that served as water intakes in Peninsular Malaysia (Selangor River and Skudai River), where four synthetic data generation methods, namely the conditional tabular generative adversarial network (CTGAN), the tabular variational autoencoder (TVAE), the Gaussian copula (GC) and the copula generative adversarial network (CopulaGAN), were used to synthesise datasets based on the real dataset. By using the pairwise correlation difference (PCD), Kullback-Leibler divergence (KLD) and the Kolmogorov-Smirnov (KS) test, the best synthetic datasets were selected for the two rivers. The CopulaGAN1 and the CopulaGAN2 yielded the best small and large synthetic datasets at Selangor River, scoring the lowest PCD, KLD and KS statistics. For the Skudai River, the TVAE1 and TVAE2 were chosen. The real and synthetic datasets were used to train the back-propagation neural network (BPNN) for the WQI estimation. Based on the various evaluation metrics, it was proven that increasing the size of training data using the synthetic data method had a positive impact on the performance of the BPNN. The BPNN trained with the CopulaGAN2 (at Selangor River) and the TVAE2 (at Skudai River) yielded more accurate estimations compared to those derived from the actual and smaller datasets. Data were insufficient to train machine learning model well in developing regions.Synthetic data methods can overcome the data scarcity issue in Malaysia.CopulaGAN and TVAE outperformed other methods at Selangor River and Skudai River.BPNN trained with synthetic datasets estimated WQI with higher accuracy.
引用
收藏
页码:6183 / 6198
页数:16
相关论文
共 50 条
  • [41] Assessing the quality of artificial intelligence-generated patient counseling for rhinosinusitis
    Hill, Gregory S.
    Fischer, Jakob L.
    Watson, Nora L.
    Riley, Charles A.
    Tolisano, Anthony M.
    INTERNATIONAL FORUM OF ALLERGY & RHINOLOGY, 2024, 14 (10) : 1634 - 1637
  • [42] Artificial Intelligence-Based Techniques for Rainfall Estimation Integrating Multisource Precipitation Datasets
    Khan, Raihan Sayeed
    Bhuiyan, Md Abul Ehsan
    ATMOSPHERE, 2021, 12 (10)
  • [43] A review on integration of artificial intelligence into water quality modelling
    Chau, Kwok-wing
    MARINE POLLUTION BULLETIN, 2006, 52 (07) : 726 - 733
  • [44] Artificial intelligence technologies in surface water quality monitoring
    Strobl, Robert O.
    Robillard, Paul D.
    WATER INTERNATIONAL, 2006, 31 (02) : 198 - 209
  • [45] Quality Estimation for Synthetic Parallel Data Generation
    Rubino, Raphael
    Toral, Antonio
    Ljubesic, Nikola
    Ramirez-Sanchez, Gema
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1843 - 1849
  • [46] Application of artificial intelligence models in water quality forecasting
    Yeon, I. S.
    Kim, J. H.
    Jun, K. W.
    ENVIRONMENTAL TECHNOLOGY, 2008, 29 (06) : 625 - 631
  • [47] Modelling and Prediction of Water Quality by Using Artificial Intelligence
    Al-Adhaileh, Mosleh Hmoud
    Alsaade, Fawaz Waselallah
    SUSTAINABILITY, 2021, 13 (08)
  • [48] Causal Artificial Intelligence Models of Food Quality Data
    Kurtanjek, Zelimir
    FOOD TECHNOLOGY AND BIOTECHNOLOGY, 2024, 62 (01) : 102 - 109
  • [49] Algorithm for monitoring water quality parameters in optical systems based on artificial intelligence data mining
    Su, Jie
    Xu, Weining
    Lin, Ziyu
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [50] A microbiological index in estimation of surface water quality
    Svetlana Curcic
    Ljiljana Comic
    Hydrobiologia, 2002, 489 : 219 - 224