Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets With Class Overlapping

被引:17
|
作者
Liu, Changhui [1 ]
Jin, Sun [2 ]
Wang, Donghong [3 ]
Luo, Zichao [4 ]
Yu, Jianbo [1 ]
Zhou, Binghai [1 ]
Yang, Changlin [5 ]
机构
[1] Tongji Univ, Sch Mech Engn, Shanghai 201804, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, Sch Mat Sci & Engn, Shanghai 200240, Peoples R China
[4] Tokyo Inst Technol, Yoshino & Yamamoto Lab, Tokyo 1528550, Japan
[5] Northwestern Polytech Univ, State Key Lab Solidificat Proc, Xian 710000, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Classification algorithms; Proposals; Sampling methods; Ant colony optimization; Data models; Prediction algorithms; Data structures; Constrained oversampling; oversampling; class overlapping; imbalanced dataset; ANT COLONY OPTIMIZATION; DATA-SETS; CLASSIFICATION; SMOTE; REGRESSION;
D O I
10.1109/ACCESS.2020.3018911
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced datasets are pervasive in classification tasks and would cause degradation of the performance of classifiers in predicting minority samples. Oversampling is effective in resolving the class imbalance problem. However, existing oversampling methods generally introduce noise examples into original datasets, especially when the datasets contain class overlapping regions. In this study, a new oversampling method named Constrained Oversampling is proposed to reduce noise generation in oversampling. This algorithm first extracts overlapping regions in the dataset. Then Ant Colony Optimization is applied to define the boundaries of minority regions. Third, oversampling under constraints is employed to synthesize new samples to get a balanced dataset. Our proposal distinguishes itself from other techniques by incorporating constraints in the oversampling process to inhibit noise generation. Experiments show that it outperforms various benchmark oversampling approaches. The explanation for the effectiveness of our method is given by studying the impact of class overlapping on imbalanced learning.
引用
收藏
页码:91452 / 91465
页数:14
相关论文
共 50 条
  • [21] A Multi-Schematic Classifier-Independent Oversampling Approach for Imbalanced Datasets
    Bej, Saptarshi
    Schulz, Kristian
    Srivastava, Prashant
    Wolfien, Markus
    Wolkenhauer, Olaf
    IEEE Access, 2021, 9 : 123358 - 123374
  • [22] Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets
    Saez, Jose A.
    Krawczyk, Bartosz
    Wozniak, Michal
    PATTERN RECOGNITION, 2016, 57 : 164 - 178
  • [23] Newton cooling theorem-based local overlapping regions cleaning and oversampling techniques for imbalanced datasets
    Tao, Liangliang
    Wang, Qingya
    Yu, Fen
    Cao, Hui
    Liang, Yage
    Luo, Huixia
    Guo, Jinghui
    NEUROCOMPUTING, 2025, 616
  • [24] Research on Oversampling Algorithm for Imbalanced Datasets Based On ARIMA Model
    Chen, Gang
    Guo, Xiaomei
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 2384 - 2389
  • [25] Iterative minority oversampling and its ensemble for ordinal imbalanced datasets
    Wang, Ning
    Zhang, Zhong-Liang
    Luo, Xing-Gang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [26] A Boundary-Information-Based Oversampling Approach to Improve Learning Performance for Imbalanced Datasets
    Li, Der-Chiang
    Shi, Qi-Shi
    Lin, Yao-San
    Lin, Liang-Sian
    ENTROPY, 2022, 24 (03)
  • [27] Selective oversampling approach for strongly imbalanced data
    Gnip, Peter
    Vokorokos, Liberios
    Drotar, Peter
    PEERJ COMPUTER SCIENCE, 2021,
  • [28] Selective oversampling approach for strongly imbalanced data
    Gnip P.
    Vokorokos L.
    Drotár P.
    PeerJ Computer Science, 2021, 7 : 1 - 22
  • [29] A new oversampling approach based differential evolution on the safe set for highly imbalanced datasets
    Zhang, Jiaoni
    Li, Yanying
    Zhang, Baoshuang
    Wang, Xialin
    Gong, Huanhuan
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
  • [30] Development of a Neighborhood Based Adaptive Heterogeneous Oversampling Ensemble Classifier for Imbalanced Binary Class Datasets
    Subbulaxmi, S. Santha
    Arumugam, G.
    PERVASIVE COMPUTING AND SOCIAL NETWORKING, ICPCSN 2022, 2023, 475 : 353 - 361