A Boundary-Information-Based Oversampling Approach to Improve Learning Performance for Imbalanced Datasets

被引:1
|
作者
Li, Der-Chiang [1 ]
Shi, Qi-Shi [1 ]
Lin, Yao-San [2 ]
Lin, Liang-Sian [3 ]
机构
[1] Natl Cheng Kung Univ, Dept Ind & Informat Management, Univ Rd, Tainan 70101, Taiwan
[2] Nanyang Technol Univ, Singapore Ctr Chinese Language, Ghim Moh Rd, Singapore 279623, Singapore
[3] Natl Taipei Univ Nursing & Hlth Sci, Dept Informat Management, Ming Te Rd, Taipei 112303, Taiwan
关键词
boundary information; synthetic sample generation; imbalanced datasets; SUPPORT VECTOR MACHINE; SAMPLING METHOD; SMOTE; CLASSIFICATION; PREDICTION; ALGORITHM; NOISY;
D O I
10.3390/e24030322
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Oversampling is the most popular data preprocessing technique. It makes traditional classifiers available for learning from imbalanced data. Through an overall review of oversampling techniques (oversamplers), we find that some of them can be regarded as danger-information-based oversamplers (DIBOs) that create samples near danger areas to make it possible for these positive examples to be correctly classified, and others are safe-information-based oversamplers (SIBOs) that create samples near safe areas to increase the correct rate of predicted positive values. However, DIBOs cause misclassification of too many negative examples in the overlapped areas, and SIBOs cause incorrect classification of too many borderline positive examples. Based on their advantages and disadvantages, a boundary-information-based oversampler (BIBO) is proposed. First, a concept of boundary information that considers safe information and dangerous information at the same time is proposed that makes created samples near decision boundaries. The experimental results show that DIBOs and BIBO perform better than SIBOs on the basic metrics of recall and negative class precision; SIBOs and BIBO perform better than DIBOs on the basic metrics for specificity and positive class precision, and BIBO is better than both of DIBOs and SIBOs in terms of integrated metrics.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] LoRAS: an oversampling approach for imbalanced datasets
    Saptarshi Bej
    Narek Davtyan
    Markus Wolfien
    Mariam Nassar
    Olaf Wolkenhauer
    Machine Learning, 2021, 110 : 279 - 301
  • [2] LoRAS: an oversampling approach for imbalanced datasets
    Bej, Saptarshi
    Davtyan, Narek
    Wolfien, Markus
    Nassar, Mariam
    Wolkenhauer, Olaf
    MACHINE LEARNING, 2021, 110 (02) : 279 - 301
  • [3] DEBOHID: A differential evolution based oversampling approach for highly imbalanced datasets
    Kaya, Ersin
    Korkmaz, Sedat
    Sahman, Mehmet Akif
    Cinar, Ahmet Cevahir
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169
  • [4] Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering
    Mujahid, Muhammad
    Kina, Erol
    Rustam, Furqan
    Villar, Monica Gracia
    Alvarado, Eduardo Silva
    Diez, Isabel De La Torre
    Ashraf, Imran
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [5] Oversampling for Mining Imbalanced Datasets: Taxonomy and Performance Evaluation
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022, 2022, 13501 : 322 - 333
  • [6] ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning
    Mohammed H. IBRAHIM
    Neural Computing and Applications, 2021, 33 : 15781 - 15806
  • [7] ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning
    Ibrahim, Mohammed H.
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (22): : 15781 - 15806
  • [8] Dual Approach to Handling Imbalanced Class in Datasets Using Oversampling and Ensemble Learning Techniques
    Pristyanto, Yoga
    Nugraha, Anggit Ferdita
    Pratama, Irfan
    Dahlan, Akhmad
    Wirasakti, Lucky Adhikrisna
    PROCEEDINGS OF THE 2021 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2021), 2021,
  • [9] AWGAN: An adaptive weighting GAN approach for oversampling imbalanced datasets
    Guan, Shaopeng
    Zhao, Xiaoyan
    Xue, Yuewei
    Pan, Hao
    INFORMATION SCIENCES, 2024, 663
  • [10] Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets With Class Overlapping
    Liu, Changhui
    Jin, Sun
    Wang, Donghong
    Luo, Zichao
    Yu, Jianbo
    Zhou, Binghai
    Yang, Changlin
    IEEE ACCESS, 2022, 10 : 91452 - 91465