Comparison of missing data imputation methods using weather data

被引:2
|
作者
Nida, Hafiza [1 ]
Kashif, Muhammad [1 ]
Khan, Muhammad Imran [1 ]
Ghamkhar, Madiha [1 ]
机构
[1] Univ Agr Faisalabad, Fac Sci, Dept Math & Stat, Faisalabad, Pakistan
来源
关键词
Rainfall; temperature; missing data; imputation methods; root mean square error; TEMPERATURE; PAKISTAN; CLIMATE; CROP;
D O I
10.21162/PAKJAS/23.228
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
Researchers and data analysts commonly experience challenges while dealing with missing data for analyzing large data sets in their respective field of studies. It is necessary to handle missing data properly to obtain better and more reliable outcomes about any research. The objective of this research is to evaluate different imputation techniques for handling missing observations occurred in the weather data. For this purpose, weather data of the variables: daily rainfall, maximum temperature (Tmax) and minimum temperature (Tmin) of 23 stations of Pakistan have been taken from Pakistan Metrological department for the years 1981 to 2020. There are about 14610 total observations of each variable while each variable has different number of missing observations, called as size of missingness, at different stations. The techniques: mean imputation, k nearest neighbors (KNN) imputation, predictive mean matching (PMM) imputation and sample imputation have been considered for the estimation of missing observations found while analyzing data of each station. The minimal value of root mean square error (RMSE) is considered to decide about station-wise imputation technique because the size of missingness varied from station to station. The KNN technique is the most appropriate to estimate the missing observations of the rainfall variables for all the stations while mean imputation technique is recommended for Tmax and Tmin data; as compared to other imputation methods.
引用
收藏
页码:327 / 336
页数:10
相关论文
共 50 条
  • [41] Imputation methods for missing data in educational diagnostic evaluation
    Fernandez-Alonso, Ruben
    Suarez-Alvarez, Javier
    Muniz, Jose
    PSICOTHEMA, 2012, 24 (01) : 167 - 175
  • [42] Imputation Methods for Multiple Regression with Missing Heteroscedastic Data
    Asif, Muhammad
    Samart, Klairung
    THAILAND STATISTICIAN, 2022, 20 (01): : 1 - 15
  • [43] Missing data imputation methods and their performance with biodistance analyses
    Kenyhercz, Michael W.
    Passalacqua, Nicholas V.
    AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY, 2015, 156 : 185 - 185
  • [44] Evaluating Imputation Methods for Missing Data in a MCI Dataset
    Gomez-Valades Batanero, Alba
    Rincon Zamorano, Mariano
    Martinez Tomas, Rafael
    Guerrero Martin, Juan
    ARTIFICIAL INTELLIGENCE IN NEUROSCIENCE: AFFECTIVE ANALYSIS AND HEALTH APPLICATIONS, PT I, 2022, 13258 : 446 - 454
  • [45] Spectral methods for imputation of missing air quality data
    Shai Moshenberg
    Uri Lerner
    Barak Fishbain
    Environmental Systems Research, 4 (1)
  • [46] Missing Data: data replacement and imputation
    Hutcheson, Graeme
    Pampaka, Maria
    JOURNAL OF MODELLING IN MANAGEMENT, 2012, 7 (02)
  • [47] Using association rule for missing data imputation
    Wu, Jianhua
    Song, Qinbao
    Shen, Junyi
    Journal of Information and Computational Science, 2007, 4 (04): : 1155 - 1161
  • [48] MICROARRAY MISSING DATA IMPUTATION USING REGRESSION
    Bayrak, Tuncay
    Ogul, Hasan
    2017 13TH IASTED INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING (BIOMED), 2017, : 68 - 73
  • [49] IMPUTATION OF MISSING PHYSICAL PERFORMANCE DATA: A COMPARISON OF APPROACHES
    Ailshire, J. A.
    Zhang, Y.
    Crimmins, E.
    Ofstedal, M.
    GERONTOLOGIST, 2015, 55 : 523 - 523
  • [50] Handing incomplete and missing data in water network database using imputation methods
    Kabir, Golam
    Tesfamariam, Solomon
    Hemsing, Jordi
    Sadiq, Rehan
    SUSTAINABLE AND RESILIENT INFRASTRUCTURE, 2020, 5 (06) : 365 - 377