A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning

被引:60
|
作者
Elreedy, Dina [1 ]
Atiya, Amir F. [1 ]
Kamalov, Firuz [2 ]
机构
[1] Cairo Univ, Comp Engn Dept, Giza 12613, Egypt
[2] Canadian Univ Dubai, Dept Elect Engn, Dubai 117781, U Arab Emirates
关键词
SMOTE; Class imbalance; Distribution density; Over-sampling; Minority class; SAMPLING APPROACH; CLASSIFICATION;
D O I
10.1007/s10994-022-06296-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance occurs when the class distribution is not equal. Namely, one class is under-represented (minority class), and the other class has significantly more samples in the data (majority class). The class imbalance problem is prevalent in many real world applications. Generally, the under-represented minority class is the class of interest. The synthetic minority over-sampling technique (SMOTE) method is considered the most prominent method for handling unbalanced data. The SMOTE method generates new synthetic data patterns by performing linear interpolation between minority class samples and their K nearest neighbors. However, the SMOTE generated patterns do not necessarily conform to the original minority class distribution. This paper develops a novel theoretical analysis of the SMOTE method by deriving the probability distribution of the SMOTE generated samples. To the best of our knowledge, this is the first work deriving a mathematical formulation for the SMOTE patterns' probability distribution. This allows us to compare the density of the generated samples with the true underlying class-conditional density, in order to assess how representative the generated samples are. The derived formula is verified by computing it on a number of densities versus densities computed and estimated empirically.
引用
收藏
页码:4903 / 4923
页数:21
相关论文
共 50 条
  • [31] Handling Autism Imbalanced Data using Synthetic Minority Over-Sampling Technique (SMOTE)
    El-Sayed, Asmaa Ahmed
    Meguid, Nagwa Abdel
    Mahmood, Mahmood Abdel Manem
    Hefny, Hesham Ahmed
    PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,
  • [32] Imbalanced Classification Based on Minority Clustering Synthetic Minority Oversampling Technique With Wind Turbine Fault Detection Application
    Yi, Huaikuan
    Jiang, Qingchao
    Yan, Xuefeng
    Wang, Bei
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (09) : 5867 - 5875
  • [33] Hybrid oversampling technique for imbalanced pattern recognition: Enhancing performance with Borderline Synthetic Minority oversampling and Generative Adversarial Networks
    Ahsan, Md Manjurul
    Raman, Shivakumar
    Liu, Yingtao
    Siddique, Zahed
    Machine Learning with Applications, 2025, 20
  • [34] CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification
    Elyan, Eyad
    Moreno-Garcia, Carlos Francisco
    Jayne, Chrisina
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (07): : 2839 - 2851
  • [35] Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification
    Jin, Dian
    Xie, Dehong
    Liu, Di
    Gong, Murong
    INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 635 - 652
  • [36] CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification
    Elyan, Eyad
    Moreno-Garcia, Carlos Francisco
    Jayne, Chrisina
    Neural Computing and Applications, 2021, 33 (07) : 2839 - 2851
  • [37] A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification
    Xu, Zhaozhao
    Shen, Derong
    Kou, Yue
    Nie, Tiezheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3740 - 3753
  • [38] CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification
    Eyad Elyan
    Carlos Francisco Moreno-Garcia
    Chrisina Jayne
    Neural Computing and Applications, 2021, 33 : 2839 - 2851
  • [39] MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Yao, Xin
    Murase, Kazuyuki
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 405 - 425
  • [40] P-SMOTE: ONE OVERSAMPLING TECHNIQUE FOR CLASS IMBALANCED TEXT CLASSIFICATION
    Wang, Jingjing
    Lu, Wen Feng
    Loh, Han Tong
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2011, VOL 2, PTS A AND B, 2012, : 1089 - 1098