Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets With Class Overlapping

被引:17
|
作者
Liu, Changhui [1 ]
Jin, Sun [2 ]
Wang, Donghong [3 ]
Luo, Zichao [4 ]
Yu, Jianbo [1 ]
Zhou, Binghai [1 ]
Yang, Changlin [5 ]
机构
[1] Tongji Univ, Sch Mech Engn, Shanghai 201804, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, Sch Mat Sci & Engn, Shanghai 200240, Peoples R China
[4] Tokyo Inst Technol, Yoshino & Yamamoto Lab, Tokyo 1528550, Japan
[5] Northwestern Polytech Univ, State Key Lab Solidificat Proc, Xian 710000, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Classification algorithms; Proposals; Sampling methods; Ant colony optimization; Data models; Prediction algorithms; Data structures; Constrained oversampling; oversampling; class overlapping; imbalanced dataset; ANT COLONY OPTIMIZATION; DATA-SETS; CLASSIFICATION; SMOTE; REGRESSION;
D O I
10.1109/ACCESS.2020.3018911
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced datasets are pervasive in classification tasks and would cause degradation of the performance of classifiers in predicting minority samples. Oversampling is effective in resolving the class imbalance problem. However, existing oversampling methods generally introduce noise examples into original datasets, especially when the datasets contain class overlapping regions. In this study, a new oversampling method named Constrained Oversampling is proposed to reduce noise generation in oversampling. This algorithm first extracts overlapping regions in the dataset. Then Ant Colony Optimization is applied to define the boundaries of minority regions. Third, oversampling under constraints is employed to synthesize new samples to get a balanced dataset. Our proposal distinguishes itself from other techniques by incorporating constraints in the oversampling process to inhibit noise generation. Experiments show that it outperforms various benchmark oversampling approaches. The explanation for the effectiveness of our method is given by studying the impact of class overlapping on imbalanced learning.
引用
收藏
页码:91452 / 91465
页数:14
相关论文
共 50 条
  • [1] LoRAS: an oversampling approach for imbalanced datasets
    Saptarshi Bej
    Narek Davtyan
    Markus Wolfien
    Mariam Nassar
    Olaf Wolkenhauer
    Machine Learning, 2021, 110 : 279 - 301
  • [2] LoRAS: an oversampling approach for imbalanced datasets
    Bej, Saptarshi
    Davtyan, Narek
    Wolfien, Markus
    Nassar, Mariam
    Wolkenhauer, Olaf
    MACHINE LEARNING, 2021, 110 (02) : 279 - 301
  • [3] Triplets Oversampling for Class Imbalanced Federated Datasets
    Xiao, Chenguang
    Wang, Shuo
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT II, 2023, 14170 : 368 - 383
  • [4] Overlap to equilibrium: Oversampling imbalanced datasets using overlapping degree
    Jubair, Sidra
    Yang, Jie
    Ali, Bilal
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (02)
  • [5] Minority Oversampling in Kernel Adaptive Subspaces for Class Imbalanced Datasets
    Lin, Chin-Teng
    Hsieh, Tsung-Yu
    Liu, Yu-Ting
    Lin, Yang-Yin
    Fang, Chieh-Ning
    Wang, Yu-Kai
    Yen, Gary
    Pal, Nikhil R.
    Chuang, Chun-Hsiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (05) : 950 - 962
  • [6] An Adaptive Oversampling Technique for Imbalanced Datasets
    Shahee, Shaukat Ali
    Ananthakumar, Usha
    ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS (ICDM 2018), 2018, 10933 : 1 - 16
  • [7] KNNOR: An oversampling technique for imbalanced datasets
    Islam, Ashhadul
    Belhaouari, Samir Brahim
    Rehman, Atiq Ur
    Bensmail, Halima
    APPLIED SOFT COMPUTING, 2022, 115
  • [8] Dual Approach to Handling Imbalanced Class in Datasets Using Oversampling and Ensemble Learning Techniques
    Pristyanto, Yoga
    Nugraha, Anggit Ferdita
    Pratama, Irfan
    Dahlan, Akhmad
    Wirasakti, Lucky Adhikrisna
    PROCEEDINGS OF THE 2021 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2021), 2021,
  • [9] AWGAN: An adaptive weighting GAN approach for oversampling imbalanced datasets
    Guan, Shaopeng
    Zhao, Xiaoyan
    Xue, Yuewei
    Pan, Hao
    INFORMATION SCIENCES, 2024, 663
  • [10] An oversampling algorithm for high-dimensional imbalanced learning with class overlapping
    Yang, Xu
    Xue, Zhen
    Zhang, Liangliang
    Wu, Jianzhen
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (02) : 1915 - 1943