MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification

被引:0
|
作者
Wang, Jiao [1 ,2 ]
Awang, Norhashidah [1 ]
机构
[1] Univ Sains Malaysia, Sch Math Sci, George Town 11800, Malaysia
[2] Puer Univ, Sch Math & Stat, Puer 665000, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Multi-class imbalanced dataset; classification; SMOTE algorithm; synthetic minority; oversampling; DATA-SETS;
D O I
10.1109/ACCESS.2024.3521120
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The learning of multi-class imbalance problems presents greater challenges and has fewer research results compared to binary imbalance problems. Resampling techniques are widely employed to address data imbalance problems. However, the majority of existing resampling methods are designed specifically for binary imbalance datasets and demonstrate significant limitations when applied to multi-class imbalance datasets. Therefore, this study introduces the MKC-SMOTE algorithm, a novel and effective method specifically tailored for multi-class imbalanced datasets. During the pre-processing phase, the algorithm takes into account the distribution of all classes and employs the k-nearest neighbors (kNN) algorithm to identify appropriate original samples for synthesizing minority class samples. It then utilizes an enhanced SMOTE algorithm for interpolation. In the post-processing phase, potentially misleading synthesized samples are eliminated by the undersampling technique. Consequently, the MKC-SMOTE algorithm generates high-quality minority class samples by strategically exploring the distributional regions of the classes. Extensive experiments were conducted on 21 real-world datasets, comparing the MKC-SMOTE algorithm with six imbalance problem handling methods and two classifiers. The results demonstrate that the MKC-SMOTE algorithm significantly enhances the classification performance of multi-class imbalanced datasets and outperforms several popular and state-of-the-art oversampling methods.
引用
收藏
页码:196929 / 196938
页数:10
相关论文
共 50 条
  • [21] SCALA: Scaling algorithm for multi-class imbalanced classification A novel algorithm specifically designed for multi-class multiple minority imbalanced data problems.
    Barzinji, Ala O.
    Ma, Jixin
    Ma, Chaoying
    PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2023, 2023, : 68 - 73
  • [22] Selecting local ensembles for multi-class imbalanced data classification
    Krawczyk, Bartosz
    Cano, Alberto
    Wozniak, Michal
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [23] Undersampling with Support Vectors for Multi-Class Imbalanced Data Classification
    Krawczyk, Bartosz
    Bellinger, Colin
    Corizzo, Roberto
    Japkowicz, Nathalie
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [24] CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification
    Elyan, Eyad
    Moreno-Garcia, Carlos Francisco
    Jayne, Chrisina
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (07): : 2839 - 2851
  • [25] CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification
    Elyan, Eyad
    Moreno-Garcia, Carlos Francisco
    Jayne, Chrisina
    Neural Computing and Applications, 2021, 33 (07) : 2839 - 2851
  • [26] CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification
    Eyad Elyan
    Carlos Francisco Moreno-Garcia
    Chrisina Jayne
    Neural Computing and Applications, 2021, 33 : 2839 - 2851
  • [27] An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Xu, Shuxiang
    Farid, Dewan Md.
    2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,
  • [28] Multi-class Boosting for Imbalanced Data
    Fernandez-Baldera, Antonio
    Buenaposada, Jose M.
    Baumela, Luis
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015), 2015, 9117 : 57 - 64
  • [29] Multi-class WHMBoost: An ensemble algorithm for multi-class imbalanced data
    Zhao, Jiakun
    Jin, Ju
    Zhang, Yibo
    Zhang, Ruifeng
    Chen, Si
    INTELLIGENT DATA ANALYSIS, 2022, 26 (03) : 599 - 614