Addressing the class-imbalance and class-overlap problems by a metaheuristic-based under-sampling approach

被引:20
|
作者
Soltanzadeh, Paria [1 ]
Feizi-Derakhshi, M. Reza [1 ]
Hashemzadeh, Mahdi [2 ,3 ]
机构
[1] Univ Tabriz, Fac Elect & Comp Engn, Dept Comp Engn, Tabriz, Iran
[2] Azarbaijan Shahid Madani Univ, Fac Informat Technol & Comp Engn, Azarshahr Rd, Tabriz 5375171379, Iran
[3] Azarbaijan Shahid Madani Univ, Artifinal Intelligence & Machine Learning Res Lab, Tabriz, Iran
关键词
Imbalanced classification; Imbalanced datasets; Class overlap; Class imbalance; Metaheuristic algorithms; Under-sampling; DATA-SETS; SMOTE; CLASSIFICATION; ENSEMBLES; DATASETS;
D O I
10.1016/j.patcog.2023.109721
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem of imbalanced class distribution in real-world datasets severely impairs the performance of classification algorithms. The learning task becomes more complicated and challenging when there is also the class-overlap problem in imbalanced data. This research tackles these problems by presenting an under-sampling approach based on a metaheuristic method in which the under-sampling problem is mapped into an optimization problem. The proposed approach aims to select an optimal subset of the majority samples to handle the imbalanced and the class-overlap problems simultaneously while avoiding the excessive elimination of majority samples, especially in overlapped regions. The quality of the generated solutions is evaluated by a classifier and optimized in an evolutionary process. Unlike most existing under-sampling methods, the majority samples are not removed only from the overlapped regions; the classifier performance determines the desired regions for eliminating the majority samples. Extensive experiments conducted on 66 synthetic and 24 real-world datasets with different imbalance ratios and overlapping degrees and two large high-dimensional datasets show a significant performance improvement from the proposed method compared to the competitors.& COPY; 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:14
相关论文
共 49 条
  • [1] Class-overlap undersampling based on Schur decomposition for Class-imbalance problems
    Dai, Qi
    Liu, Jian-wei
    Shi, Yong-hui
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 221
  • [2] Exploratory under-sampling for class-imbalance learning
    Liu, Xu-Ying
    Wu, Jianxin
    Zhou, Zhi-Hua
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 965 - 969
  • [3] A New Under-Sampling Method to Face Class Overlap and Imbalance
    Guzman-Ponce, Angelica
    Valdovinos, Rosa Maria
    Sanchez, Jose Salvador
    Marcial-Romero, Jose Raymundo
    APPLIED SCIENCES-BASEL, 2020, 10 (15):
  • [4] Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction
    Goyal, Somya
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (03) : 2023 - 2064
  • [5] Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction
    Somya Goyal
    Artificial Intelligence Review, 2022, 55 : 2023 - 2064
  • [6] Distance mapping overlap complexity metric for class-imbalance problems
    Dai, Qi
    Liu, Jian-wei
    Shi, Yong-hui
    APPLIED SOFT COMPUTING, 2024, 163
  • [7] A majority affiliation based under-sampling method for class imbalance problem
    Xie, Ying
    Huang, Xian
    Qin, Feng
    Li, Fagen
    Ding, Xuyang
    INFORMATION SCIENCES, 2024, 662
  • [8] A Hybrid Evolutionary Under-sampling Method for Handling the Class Imbalance Problem with Overlap in Credit Classification
    Ping Gong
    Junguang Gao
    Li Wang
    Journal of Systems Science and Systems Engineering, 2022, 31 : 728 - 752
  • [9] A Hybrid Evolutionary Under-sampling Method for Handling the Class Imbalance Problem with Overlap in Credit Classification
    Gong, Ping
    Gao, Junguang
    Wang, Li
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2022, 31 (06) : 728 - 752
  • [10] Class Imbalance Problem: A Wrapper-Based Approach using Under-Sampling with Ensemble Learning
    Sikora, Riyaz
    Lee, Yoon Sang
    INFORMATION SYSTEMS FRONTIERS, 2024,