Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets With Class Overlapping

被引:17
|
作者
Liu, Changhui [1 ]
Jin, Sun [2 ]
Wang, Donghong [3 ]
Luo, Zichao [4 ]
Yu, Jianbo [1 ]
Zhou, Binghai [1 ]
Yang, Changlin [5 ]
机构
[1] Tongji Univ, Sch Mech Engn, Shanghai 201804, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, Sch Mat Sci & Engn, Shanghai 200240, Peoples R China
[4] Tokyo Inst Technol, Yoshino & Yamamoto Lab, Tokyo 1528550, Japan
[5] Northwestern Polytech Univ, State Key Lab Solidificat Proc, Xian 710000, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Classification algorithms; Proposals; Sampling methods; Ant colony optimization; Data models; Prediction algorithms; Data structures; Constrained oversampling; oversampling; class overlapping; imbalanced dataset; ANT COLONY OPTIMIZATION; DATA-SETS; CLASSIFICATION; SMOTE; REGRESSION;
D O I
10.1109/ACCESS.2020.3018911
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced datasets are pervasive in classification tasks and would cause degradation of the performance of classifiers in predicting minority samples. Oversampling is effective in resolving the class imbalance problem. However, existing oversampling methods generally introduce noise examples into original datasets, especially when the datasets contain class overlapping regions. In this study, a new oversampling method named Constrained Oversampling is proposed to reduce noise generation in oversampling. This algorithm first extracts overlapping regions in the dataset. Then Ant Colony Optimization is applied to define the boundaries of minority regions. Third, oversampling under constraints is employed to synthesize new samples to get a balanced dataset. Our proposal distinguishes itself from other techniques by incorporating constraints in the oversampling process to inhibit noise generation. Experiments show that it outperforms various benchmark oversampling approaches. The explanation for the effectiveness of our method is given by studying the impact of class overlapping on imbalanced learning.
引用
收藏
页码:91452 / 91465
页数:14
相关论文
共 50 条
  • [41] An extension of Synthetic Minority Oversampling Technique based on Kalman filter for imbalanced datasets
    Thejas, G. S.
    Hariprasad, Yashas
    Iyengar, S. S.
    Sunitha, N. R.
    Badrinath, Prajwal
    Chennupati, Shasank
    MACHINE LEARNING WITH APPLICATIONS, 2022, 8
  • [42] NCLWO: Newton's cooling law-based weighted oversampling algorithm for imbalanced datasets with feature noise
    Tao, Liangliang
    Wang, Qingya
    Zhu, Zhicheng
    Yu, Fen
    Yin, Xia
    NEUROCOMPUTING, 2024, 610
  • [43] ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning
    Mohammed H. IBRAHIM
    Neural Computing and Applications, 2021, 33 : 15781 - 15806
  • [44] A-RDBOTE: an improved oversampling technique for imbalanced credit-scoring datasets
    Sudhansu R. Lenka
    Sukant Kishoro Bisoy
    Rojalina Priyadarshini
    Risk Management, 2023, 25
  • [45] A novel adaptive boundary weighted and synthetic minority oversampling algorithm for imbalanced datasets
    Song, Xudong
    Chen, Yilin
    Liang, Pan
    Wan, Xiaohui
    Cui, Yunxian
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 3245 - 3259
  • [46] A-RDBOTE: an improved oversampling technique for imbalanced credit-scoring datasets
    Lenka, Sudhansu R.
    Bisoy, Sukant Kishoro
    Priyadarshini, Rojalina
    RISK MANAGEMENT-AN INTERNATIONAL JOURNAL, 2023, 25 (04):
  • [47] An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets
    Kovacs, Gyorgy
    APPLIED SOFT COMPUTING, 2019, 83
  • [48] Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering
    Mujahid, Muhammad
    Kina, Erol
    Rustam, Furqan
    Villar, Monica Gracia
    Alvarado, Eduardo Silva
    Diez, Isabel De La Torre
    Ashraf, Imran
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [49] An Adaptive and Robust Method for Oriented Oversampling With Spatial Information for Imbalanced Noisy Datasets
    Deng, Yi
    Li, Mingyong
    IEEE ACCESS, 2023, 11 : 122610 - 122624
  • [50] ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning
    Ibrahim, Mohammed H.
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (22): : 15781 - 15806