Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach

被引:36
|
作者
Rodriguez, Rafael [1 ]
Pastorini, Marcos [2 ]
Etcheverry, Lorena [2 ]
Chreties, Christian [1 ]
Fossati, Monica [1 ]
Castro, Alberto [2 ]
Gorgoglione, Angela [1 ]
机构
[1] Univ Republica, Fac Ingn, Inst Mecan Fluidos & Ingn Ambiental IMFIA, Montevideo 11300, Uruguay
[2] Univ Republica, Fac Ingn, Inst Computac InCo, Montevideo 11300, Uruguay
关键词
data scarcity; water quality; missing data; univariate imputation; multivariate imputation; machine learning; hydroinformatics; PRECIPITATION RECORDS; TEMPERATURE; ACCURACY; IMPROVE; RUNOFF; RIVER; IDW;
D O I
10.3390/su13116318
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The monitoring of surface-water quality followed by water-quality modeling and analysis are essential for generating effective strategies in surface-water-resource management. However, worldwide, particularly in developing countries, water-quality studies are limited due to the lack of a complete and reliable dataset of surface-water-quality variables. In this context, several statistical and machine-learning models were assessed for imputing water-quality data at six monitoring stations located in the Santa Lucia Chico river (Uruguay), a mixed lotic and lentic river system. The challenge of this study is represented by the high percentage of missing data (between 50% and 70%) and the high temporal and spatial variability that characterizes the water-quality variables. The competing algorithms implement univariate and multivariate imputation methods (inverse distance weighting (IDW), Random Forest Regressor (RFR), Ridge (R), Bayesian Ridge (BR), AdaBoost (AB), Hubber Regressor (HR), Support Vector Regressor (SVR) and K-nearest neighbors Regressor (KNNR)). According to the results, more than 76% of the imputation outcomes are considered "satisfactory" (NSE > 0.45). The imputation performance shows better results at the monitoring stations located inside the reservoir than those positioned along the mainstream. IDW was the model with the best imputation results, followed by RFR, HR and SVR. The approach proposed in this study is expected to aid water-resource researchers and managers in augmenting water-quality datasets and overcoming the missing data issue to increase the number of future studies related to the water-quality matter.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Evaluation of Machine Learning Classification Algorithms & Missing Data Imputation Techniques
    Nwulu, Nnamdi I.
    2017 INTERNATIONAL ARTIFICIAL INTELLIGENCE AND DATA PROCESSING SYMPOSIUM (IDAP), 2017,
  • [22] Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data
    Betancourt, Clara
    Li, Cathy W. Y.
    Kleinert, Felix
    Schultz, Martin G.
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2023, 57 (46) : 18246 - 18258
  • [23] Imputation of continuous missing values in profile data
    Yang, Luo
    Wang, Kaibo
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2022, 38 (07) : 3644 - 3662
  • [24] Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation
    Alamoodi, A. H.
    Zaidan, B. B.
    Zaidan, A. . A. .
    Albahri, O. S.
    Chen, Juliana
    Chyad, M. A.
    Garfan, Salem
    Aleesa, A. M.
    CHAOS SOLITONS & FRACTALS, 2021, 151
  • [25] WATER-QUALITY DATA FOR WATER-QUALITY DECISIONS
    LYON, WA
    HUNTER, JS
    WATER SCIENCE AND TECHNOLOGY, 1981, 13 (03) : 237 - 243
  • [26] Imputation of missing sub-hourly precipitation data in a large sensor network: A machine learning approach
    Chivers, Benedict D.
    Wallbank, John
    Cole, Steven J.
    Sebek, Ondrej
    Stanley, Simon
    Fry, Matthew
    Leontidis, Georgios
    JOURNAL OF HYDROLOGY, 2020, 588
  • [27] Energetic Map Data Imputation: A Machine Learning Approach
    Straub, Tobias
    Nagy, Madalina Mandy
    Sidorov, Maxim
    Tonetto, Leonardo
    Frey, Michael
    Gauterin, Frank
    ENERGIES, 2020, 13 (04)
  • [28] Data variability in the imputation quality of missing data
    Stochero, Elisandra Lucia Moro
    Lucio, Alessandro Dal'Col
    Jacobi, Luciane Flores
    ACTA SCIENTIARUM-AGRONOMY, 2024, 46
  • [29] Missing Data Imputation for Supervised Learning
    Poulos, Jason
    Valle, Rafael
    APPLIED ARTIFICIAL INTELLIGENCE, 2018, 32 (02) : 186 - 196
  • [30] Machine Learning for the Relationship of High-Energy Electron Flux between GEO and MEO with Application to Missing Values Imputation for Beidou MEO Data
    Cui, Ruifei
    Jiang, Yu
    Tian, Chao
    Zhang, Riwei
    Hu, Sihui
    Li, Jiyun
    OPEN ASTRONOMY, 2021, 30 (01) : 62 - 72