Machine Learning Based Missing Data Imputation in Categorical Datasets

被引:1
|
作者
Ishaq, Muhammad [1 ]
Zahir, Sana [1 ]
Iftikhar, Laila [1 ]
Bulbul, Mohammad Farhad [2 ]
Rho, Seungmin [3 ]
Lee, Mi Young [4 ]
机构
[1] Univ Agr Peshawar, Inst Comp Sci & Informat Technol, Peshawar 25000, Khyber Pakhtunk, Pakistan
[2] Jashore Univ Sci & Technol, Dept Math, Jashore 7408, Bangladesh
[3] Chung Ang Univ, Dept Ind Secur, Seoul 06974, South Korea
[4] Chung Ang Univ, Dept Res, Seoul 06974, South Korea
来源
IEEE ACCESS | 2024年 / 12卷
基金
新加坡国家研究基金会;
关键词
Data cleansing; missing data imputation; classification; regression and categorical datasets; MULTIPLE IMPUTATION;
D O I
10.1109/ACCESS.2024.3411817
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In order to predict and fill in the gaps in categorical datasets, this research looked into the use of machine learning algorithms. The emphasis was on ensemble models constructed using the Error Correction Output Codes (ECOC) framework, including models based on SVM and KNN as well as a hybrid classifier that combines models based on SVM, KNN, and MLP. Three diverse datasets-the CPU, Hypothyroid, and Breast Cancer datasets-were employed to validate these algorithms. Results indicated that these machine learning techniques provided substantial performance in predicting and completing missing data, with the effectiveness varying based on the specific dataset and missing data pattern. Compared to solo models, ensemble models that made use of the ECOC framework significantly improved prediction accuracy and robustness. Deep learning for missing data imputation has obstacles despite these encouraging results, including the requirement for large amounts of labeled data and the possibility of over-fitting. Subsequent research endeavors ought to evaluate the feasibility and efficacy of deep learning algorithms in the context of the imputation of missing data.
引用
收藏
页码:88332 / 88344
页数:13
相关论文
共 50 条
  • [1] Analysis of Machine Learning Based Imputation of Missing Data
    Rizvi, Syed Tahir Hussain
    Latif, Muhammad Yasir
    Amin, Muhammad Saad
    Telmoudi, Achraf Jabeur
    Shah, Nasir Ali
    CYBERNETICS AND SYSTEMS, 2023,
  • [2] Missing Categorical Data Imputation Approach Based on Similarity
    Wu, Sen
    Feng, Xiaodong
    Han, Yushan
    Wang, Qiang
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2827 - 2832
  • [3] The impact of imputation quality on machine learning classifiers for datasets with missing values
    Tolou Shadbahr
    Michael Roberts
    Jan Stanczuk
    Julian Gilbey
    Philip Teare
    Sören Dittmer
    Matthew Thorpe
    Ramon Viñas Torné
    Evis Sala
    Pietro Lió
    Mishal Patel
    Jacobus Preller
    James H. F. Rudd
    Tuomas Mirtti
    Antti Sakari Rannikko
    John A. D. Aston
    Jing Tang
    Carola-Bibiane Schönlieb
    Communications Medicine, 3
  • [4] The impact of imputation quality on machine learning classifiers for datasets with missing values
    Shadbahr, Tolou
    Roberts, Michael
    Stanczuk, Jan
    Gilbey, Julian
    Teare, Philip
    Dittmer, Soeren
    Thorpe, Matthew
    Torne, Ramon Vinas
    Sala, Evis
    Lio, Pietro
    Patel, Mishal
    Preller, Jacobus
    Rudd, James H. F.
    Mirtti, Tuomas
    Rannikko, Antti Sakari
    Aston, John A. D.
    Tang, Jing
    Schonlieb, Carola-Bibiane
    COMMUNICATIONS MEDICINE, 2023, 3 (01):
  • [5] APPLICATION OF ASSOCIATION RULES IN MISSING VALUES IMPUTATION IN CATEGORICAL DATASETS
    Kaiser, Jiri
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MODELLING AND SIMULATION 2010 IN PRAGUE (MS'10 PRAGUE), 2010, : 203 - 206
  • [6] Approximate Imputation Method for Missing Data in Machine Learning
    Cao W.
    Chu Y.
    Li X.
    1600, Xi'an Jiaotong University (51): : 142 - 148
  • [7] Latent class based multiple imputation approach for missing categorical data
    Gebregziabher, Mulugeta
    DeSantis, Stacia M.
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (11) : 3252 - 3262
  • [8] Missing Data Imputation using Machine Learning Algorithm for Supervised Learning
    Cenitta, D.
    Arjunan, R. Vijaya
    Prema, K., V
    2021 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2021,
  • [9] ExtraImpute: A Novel Machine Learning Method for Missing Data Imputation
    Alabadla, Mustafa
    Sidi, Fatimah
    Ishak, Iskandar
    Ibrahim, Hamidah
    Affendey, Lilly Suriani
    Hamdan, Hazlina
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (05) : 470 - 476
  • [10] A nonparametric multiple imputation approach for missing categorical data
    Zhou, Muhan
    He, Yulei
    Yu, Mandi
    Hsu, Chiu-Hsieh
    BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17