Data preprocessing impact on machine learning algorithm performance

被引:7
|
作者
Amato, Alberto [1 ]
Di Lecce, Vincenzo [1 ]
机构
[1] Politecn Bari, Dept Elect & Informat Engn, Bari, Italy
关键词
data analysis; PCA; SPQR; FCM; DIMENSIONALITY REDUCTION; FEATURE-SELECTION; APPROXIMATIONS; EIGENMAPS;
D O I
10.1515/comp-2022-0278
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The popularity of artificial intelligence applications is on the rise, and they are producing better outcomes in numerous fields of research. However, the effectiveness of these applications relies heavily on the quantity and quality of data used. While the volume of data available has increased significantly in recent years, this does not always lead to better results, as the information content of the data is also important. This study aims to evaluate a new data preprocessing technique called semi-pivoted QR (SPQR) approximation for machine learning. This technique is designed for approximating sparse matrices and acts as a feature selection algorithm. To the best of our knowledge, it has not been previously applied to data preprocessing in machine learning algorithms. The study aims to evaluate the impact of SPQR on the performance of an unsupervised clustering algorithm and compare its results to those obtained using principal component analysis (PCA) as the preprocessing algorithm. The evaluation is conducted on various publicly available datasets. The findings suggest that the SPQR algorithm can produce outcomes comparable to those achieved using PCA without altering the original dataset.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Effect of Data Preprocessing in the Detection of Epilepsy using Machine Learning Techniques
    Sabarivani, A.
    Ramadevi, R.
    Pandian, R.
    Krishnamoorthy, N. R.
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2021, 80 (12): : 1066 - 1077
  • [32] Data preprocessing for machine analysis of sales representatives' key performance indicators
    Vladova, Alla Yu
    Shek, Elena D.
    BIZNES INFORMATIKA-BUSINESS INFORMATICS, 2021, 15 (03): : 48 - 59
  • [33] Machine Learning-Based Imputation Approach with Dynamic Feature Extraction for Wireless RAN Performance Data Preprocessing
    Dahj, Jean Nestor M.
    Ogudo, Kingsley A. A.
    SYMMETRY-BASEL, 2023, 15 (06):
  • [34] Investigating the role of data preprocessing, hyperparameters tuning, and type of machine learning algorithm in the improvement of drowsy EEG signal modeling
    Farhangi, Farbod
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2022, 15
  • [35] Microbiome Preprocessing Machine Learning Pipeline
    Jasner, Yoel Y.
    Belogolovski, Anna
    Ben-Itzhak, Meirav
    Koren, Omry
    Louzoun, Yoram
    FRONTIERS IN IMMUNOLOGY, 2021, 12
  • [36] Data reduction algorithm for machine learning and data mining
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    NEW FRONTIERS IN APPLIED ARTIFICIAL INTELLIGENCE, 2008, 5027 : 276 - 285
  • [37] Effect of data preprocessing and machine learning hyperparameters on mass spectrometry imaging models
    Gardner, Wil
    Winkler, David A.
    Alexander, David L. J.
    Ballabio, Davide
    Muir, Benjamin W.
    Pigram, Paul J.
    JOURNAL OF VACUUM SCIENCE & TECHNOLOGY A, 2023, 41 (06):
  • [38] Exploring Data Preprocessing and Machine Learning Methods for Forecasting Worldwide Fertilizers Consumption
    Pacheco, Carla
    Guimaraes, Mario
    Bezerra, Eduardo
    Lobosco, Dacy
    Soares, Jorge
    Gonzales, Pedro Henrique
    Andrade, Adalberto
    de Souza, Cristina Gomes
    Ogasawara, Eduardo
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [39] Comparative Analysis of Data Preprocessing Methods in Machine Learning for Breast Cancer Classification
    Stockton, Timothy
    Peddle, Brandon
    Gaulin, Angelica
    Wiechert, Emma
    Lu, Wei
    ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 3, AINA 2024, 2024, 201 : 268 - 279
  • [40] PHYSICS-BASED AUTOMATED DATA PREPROCESSING (ADP) FOR MACHINE LEARNING APPLICATIONS
    Sotubadi, Saleh Valizadeh
    Vinh Nguyen
    PROCEEDINGS OF ASME 2023 INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, IDETC-CIE2023, VOL 2, 2023,