Comparison of missing data imputation methods using weather data

被引:2
|
作者
Nida, Hafiza [1 ]
Kashif, Muhammad [1 ]
Khan, Muhammad Imran [1 ]
Ghamkhar, Madiha [1 ]
机构
[1] Univ Agr Faisalabad, Fac Sci, Dept Math & Stat, Faisalabad, Pakistan
来源
关键词
Rainfall; temperature; missing data; imputation methods; root mean square error; TEMPERATURE; PAKISTAN; CLIMATE; CROP;
D O I
10.21162/PAKJAS/23.228
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
Researchers and data analysts commonly experience challenges while dealing with missing data for analyzing large data sets in their respective field of studies. It is necessary to handle missing data properly to obtain better and more reliable outcomes about any research. The objective of this research is to evaluate different imputation techniques for handling missing observations occurred in the weather data. For this purpose, weather data of the variables: daily rainfall, maximum temperature (Tmax) and minimum temperature (Tmin) of 23 stations of Pakistan have been taken from Pakistan Metrological department for the years 1981 to 2020. There are about 14610 total observations of each variable while each variable has different number of missing observations, called as size of missingness, at different stations. The techniques: mean imputation, k nearest neighbors (KNN) imputation, predictive mean matching (PMM) imputation and sample imputation have been considered for the estimation of missing observations found while analyzing data of each station. The minimal value of root mean square error (RMSE) is considered to decide about station-wise imputation technique because the size of missingness varied from station to station. The KNN technique is the most appropriate to estimate the missing observations of the rainfall variables for all the stations while mean imputation technique is recommended for Tmax and Tmin data; as compared to other imputation methods.
引用
收藏
页码:327 / 336
页数:10
相关论文
共 50 条
  • [21] A Comparison of Hot Deck Imputation and Substitution Methods in The Estimation of Missing Data
    Yesilova, Abdullah
    Kaya, Yilmaz
    Almali, M. Nuri
    GAZI UNIVERSITY JOURNAL OF SCIENCE, 2011, 24 (01): : 69 - 75
  • [22] Empirical Comparison of Imputation Methods for Multivariate Missing Data in Public Health
    Pan, Steven
    Chen, Sixia
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2023, 20 (02)
  • [23] Comparison of Imputation Methods for Missing Rate of Perceived Exertion Data in Rugby
    Epp-Stobbe, Amarah
    Tsai, Ming-Chang
    Klimstra, Marc
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2022, 4 (04): : 827 - 838
  • [24] A Comparison of Multiple Imputation Methods for Recovering Missing Data in Hydrological Studies
    Hamzah, Fatimah Bibi
    Hamzah, Firdaus Mohd
    Razali, Siti Fatin Mohd
    Samad, Hafiza
    CIVIL ENGINEERING JOURNAL-TEHRAN, 2021, 7 (09): : 1608 - 1619
  • [25] Comparison of Missing Value Imputation Methods for Malaysian Hourly Rainfall Data
    Mazlan, Noorhafizah
    Rahman, Nurul Aishah
    Deni, Sayang Mohd
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS & STATISTICS, 2015, 53 (06): : 209 - 215
  • [26] Missing data and imputation methods in partition of variables
    da Silva, AL
    Saporta, G
    Bacelar-Nicolau, H
    CLASSIFICATION, CLUSTERING, AND DATA MINING APPLICATIONS, 2004, : 631 - 637
  • [27] Imputation methods for missing data for polygenic models
    Brooke Fridley
    Kari Rabe
    Mariza de Andrade
    BMC Genetics, 4
  • [28] Analyzing Coarsened and Missing Data by Imputation Methods
    van Der Burg, Lars L. J.
    Bohringer, Stefan
    Bartlett, Jonathan W.
    Bosse, Tjalling
    Horeweg, Nanda
    de Wreede, Liesbeth C.
    Putter, Hein
    STATISTICS IN MEDICINE, 2025, 44 (06)
  • [29] Imputation methods for missing data for polygenic models
    Fridley, B
    Rabe, K
    de Andrade, M
    BMC GENETICS, 2003, 4 (Suppl 1)
  • [30] Analysis of missing data and comparing the accuracy of imputation methods using wheat crop data
    Saini, Preeti
    Nagpal, Bharti
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 40393 - 40414