Machine Learning Based Missing Data Imputation in Categorical Datasets

被引:1
|
作者
Ishaq, Muhammad [1 ]
Zahir, Sana [1 ]
Iftikhar, Laila [1 ]
Bulbul, Mohammad Farhad [2 ]
Rho, Seungmin [3 ]
Lee, Mi Young [4 ]
机构
[1] Univ Agr Peshawar, Inst Comp Sci & Informat Technol, Peshawar 25000, Khyber Pakhtunk, Pakistan
[2] Jashore Univ Sci & Technol, Dept Math, Jashore 7408, Bangladesh
[3] Chung Ang Univ, Dept Ind Secur, Seoul 06974, South Korea
[4] Chung Ang Univ, Dept Res, Seoul 06974, South Korea
来源
IEEE ACCESS | 2024年 / 12卷
基金
新加坡国家研究基金会;
关键词
Data cleansing; missing data imputation; classification; regression and categorical datasets; MULTIPLE IMPUTATION;
D O I
10.1109/ACCESS.2024.3411817
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In order to predict and fill in the gaps in categorical datasets, this research looked into the use of machine learning algorithms. The emphasis was on ensemble models constructed using the Error Correction Output Codes (ECOC) framework, including models based on SVM and KNN as well as a hybrid classifier that combines models based on SVM, KNN, and MLP. Three diverse datasets-the CPU, Hypothyroid, and Breast Cancer datasets-were employed to validate these algorithms. Results indicated that these machine learning techniques provided substantial performance in predicting and completing missing data, with the effectiveness varying based on the specific dataset and missing data pattern. Compared to solo models, ensemble models that made use of the ECOC framework significantly improved prediction accuracy and robustness. Deep learning for missing data imputation has obstacles despite these encouraging results, including the requirement for large amounts of labeled data and the possibility of over-fitting. Subsequent research endeavors ought to evaluate the feasibility and efficacy of deep learning algorithms in the context of the imputation of missing data.
引用
收藏
页码:88332 / 88344
页数:13
相关论文
共 50 条
  • [21] IMPUTATION OF MISSING CATEGORICAL-DATA BY MAXIMIZING INTERNAL CONSISTENCY
    VANBUUREN, S
    VANRIJCKEVORSEL, JLA
    PSYCHOMETRIKA, 1992, 57 (04) : 567 - 580
  • [22] Categorical missing data imputation approach via sparse representation
    Shao, Xiaochen
    Wu, Sen
    Feng, Xiaodong
    Song, Rui
    INTERNATIONAL JOURNAL OF SERVICES TECHNOLOGY AND MANAGEMENT, 2016, 22 (3-5) : 256 - 270
  • [23] The Optimal Machine Learning-Based Missing Data Imputation for the Cox Proportional Hazard Model
    Guo, Chao-Yu
    Yang, Ying-Chen
    Chen, Yi-Hau
    FRONTIERS IN PUBLIC HEALTH, 2021, 9
  • [24] A Machine Learning-Based Missing Data Imputation with FHIR Interoperability Approach in Sepsis Prediction
    Toro Beltran, Cristian Fernando
    Villarreal Ibanez, Erick Daniel
    Milen Orejuela, Vivian
    Garcia Henao, John Anderson
    HIGH PERFORMANCE COMPUTING, CARLA 2022, 2022, 1660 : 116 - 130
  • [25] Missing data imputation using machine learning based methods to improve HCC survival prediction
    Yumus, Mehmethan
    Apaydin, Merve
    Degirmenci, Ali
    Karal, Omer
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [26] Missing Data Imputation for Supervised Learning
    Poulos, Jason
    Valle, Rafael
    APPLIED ARTIFICIAL INTELLIGENCE, 2018, 32 (02) : 186 - 196
  • [27] Missing Data Imputation based on Unsupervised Simple Competitive Learning
    Lee, Byoung Jik
    PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING AND DATA BASES, 2010, : 292 - +
  • [28] Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation
    Alamoodi, A. H.
    Zaidan, B. B.
    Zaidan, A. . A. .
    Albahri, O. S.
    Chen, Juliana
    Chyad, M. A.
    Garfan, Salem
    Aleesa, A. M.
    CHAOS SOLITONS & FRACTALS, 2021, 151
  • [29] Imputation of missing gas permeability data for polymer membranes using machine learning
    Yuan, Qi
    Longo, Mariagiulia
    Thornton, Aaron W.
    McKeown, Neil B.
    Comesana-Gandara, Bibiana
    Jansen, Johannes C.
    Jelfs, Kim E.
    JOURNAL OF MEMBRANE SCIENCE, 2021, 627
  • [30] Application of machine learning methods in the imputation of heterogeneous co-missing data
    So, Hon Yiu
    Ma, Jinhui
    Griffith, Lauren E.
    Balakrishnan, Narayanaswamy
    JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE, 2025,