Addressing the class-imbalance and class-overlap problems by a metaheuristic-based under-sampling approach

被引:20
|
作者
Soltanzadeh, Paria [1 ]
Feizi-Derakhshi, M. Reza [1 ]
Hashemzadeh, Mahdi [2 ,3 ]
机构
[1] Univ Tabriz, Fac Elect & Comp Engn, Dept Comp Engn, Tabriz, Iran
[2] Azarbaijan Shahid Madani Univ, Fac Informat Technol & Comp Engn, Azarshahr Rd, Tabriz 5375171379, Iran
[3] Azarbaijan Shahid Madani Univ, Artifinal Intelligence & Machine Learning Res Lab, Tabriz, Iran
关键词
Imbalanced classification; Imbalanced datasets; Class overlap; Class imbalance; Metaheuristic algorithms; Under-sampling; DATA-SETS; SMOTE; CLASSIFICATION; ENSEMBLES; DATASETS;
D O I
10.1016/j.patcog.2023.109721
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem of imbalanced class distribution in real-world datasets severely impairs the performance of classification algorithms. The learning task becomes more complicated and challenging when there is also the class-overlap problem in imbalanced data. This research tackles these problems by presenting an under-sampling approach based on a metaheuristic method in which the under-sampling problem is mapped into an optimization problem. The proposed approach aims to select an optimal subset of the majority samples to handle the imbalanced and the class-overlap problems simultaneously while avoiding the excessive elimination of majority samples, especially in overlapped regions. The quality of the generated solutions is evaluated by a classifier and optimized in an evolutionary process. Unlike most existing under-sampling methods, the majority samples are not removed only from the overlapped regions; the classifier performance determines the desired regions for eliminating the majority samples. Extensive experiments conducted on 66 synthetic and 24 real-world datasets with different imbalance ratios and overlapping degrees and two large high-dimensional datasets show a significant performance improvement from the proposed method compared to the competitors.& COPY; 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:14
相关论文
共 49 条
  • [21] SOM-US: A Novel Under-Sampling Technique for Handling Class Imbalance Problem
    Kumar, Ajay
    JOURNAL OF COMMUNICATIONS SOFTWARE AND SYSTEMS, 2024, 20 (01) : 69 - 75
  • [22] Evolutionary simultaneous under and oversampling of instances for dealing with class-imbalance datasets in multilabel problems
    Garcia-Pedrajas, Nicolas
    Cuevas-Munoz, Jose M.
    de Haro-Garcia, Aida
    APPLIED SOFT COMPUTING, 2024, 159
  • [23] A hybrid sampling algorithm for imbalanced and class-overlap data based on natural neighbors and density estimation
    Li, Xinqi
    Liu, Qicheng
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (03) : 2259 - 2290
  • [24] SWSEL: Sliding Window-based Selective Ensemble Learning for class-imbalance problems
    Dai, Qi
    Liu, Jian-wei
    Yang, Jia-Peng
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 121
  • [25] DBIG-US: A two-stage under-sampling algorithm to face the class imbalance problem
    Guzman-Ponce, A.
    Sanchez, J. S.
    Valdovinos, R. M.
    Marcial-Romero, J. R.
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 168
  • [26] An Approach to Class Imbalance Problem Based on Stacking and Inverse Random Under Sampling Methods
    Zhang, Yuwei
    Liu, Guanjun
    Luan, Wenjing
    Yan, Chungang
    Jiang, Changjun
    2018 IEEE 15TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2018,
  • [27] High Class-Imbalance in pre-miRNA Prediction: A Novel Approach Based on deepSOM
    Stegmayer, Georgina
    Yones, Cristian
    Kamenetzky, Laura
    Milone, Diego H.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2017, 14 (06) : 1316 - 1326
  • [28] A density-based oversampling approach for class imbalance and data overlap
    Zhang, Ruizhi
    Lu, Shaowu
    Yan, Baokang
    Yu, Puliang
    Tang, Xiaoqi
    COMPUTERS & INDUSTRIAL ENGINEERING, 2023, 186
  • [29] A Cluster-Based Under-Sampling Algorithm for Class-Imbalanced Data
    Guzman-Ponce, A.
    Valdovinos, R. M.
    Sanchez, J. S.
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2020, 2020, 12344 : 299 - 311
  • [30] Alleviating Class Imbalance Issue in Software Fault Prediction Using DBSCAN-Based Induced Graph Under-Sampling Method
    Bhandari, Kirti
    Kumar, Kuldeep
    Sangal, Amrit Lal
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024, 49 (09) : 12589 - 12627