An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine

被引:2
|
作者
Zhu, Bo [1 ]
Jing, Xiaona [1 ]
Qiu, Lan [1 ]
Li, Runbo [1 ]
机构
[1] Kunming Univ Sci & Technol, Coll Mech & Elect Engn, Kunming 650500, Peoples R China
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 79卷 / 03期
关键词
Imbalanced data classification; Silhouette value; Mahalanobis distance; RIME algorithm; CS-SVM; SMOTE; MODEL;
D O I
10.32604/cmc.2024.048062
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When building a classification model, the scenario where the samples of one class are significantly more than those of the other class is called data imbalance. Data imbalance causes the trained classification model to be in favor of the majority class (usually defined as the negative class), which may do harm to the accuracy of the minority class (usually defined as the positive class), and then lead to poor overall performance of the model. A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article, which is based on a new hybrid resampling approach (MSHR) and a new fine cost-sensitive support vector machine (CS-SVM) classifier (FCSSVM). The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples, based on which, the so-called pseudo-negative samples are screened out to generate new positive samples (over-sampling step) through linear interpolation and are deleted finally (under-sampling step). This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline, without changing the overall scale of the dataset. The FCSSVM is an improved version of the traditional CS-SVM. It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously, and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice (RIME) algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline. To verify the effectiveness of the proposed method, a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets. The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases, and both the MSHR and the FCSSVM played significant roles.
引用
收藏
页码:3977 / 3999
页数:23
相关论文
共 50 条
  • [31] Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data
    Cheng, Fanyong
    Zhang, Jing
    Wen, Cuihong
    PATTERN RECOGNITION LETTERS, 2016, 80 : 107 - 112
  • [32] Large cost-sensitive margin distribution machine for imbalanced data classification
    Cheng, Fanyong
    Zhang, Jing
    Wen, Cuihong
    Liu, Zhaohua
    Li, Zuoyong
    NEUROCOMPUTING, 2017, 224 : 45 - 57
  • [33] Imbalanced classification using support vector machine ensemble
    Jiang Tian
    Hong Gu
    Wenqi Liu
    Neural Computing and Applications, 2011, 20 : 203 - 209
  • [34] Improving Classification with Cost-Sensitive Approach and Support Vector Machine
    Muntean, Maria
    Ileana, Ioan
    Rotar, Corina
    Valean, Honoriu
    9TH ROEDUNET IEEE INTERNATIONAL CONFERENCE, 2010, : 180 - +
  • [35] Imbalanced classification using support vector machine ensemble
    Tian, Jiang
    Gu, Hong
    Liu, Wenqi
    NEURAL COMPUTING & APPLICATIONS, 2011, 20 (02): : 203 - 209
  • [36] Hybrid Support Vector Machine based Feature Selection Method for Text Classification
    Sabbah, Thabit
    Ayyash, Mosab
    Ashraf, Mahmood
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (3A) : 599 - 609
  • [37] A Combination of Resampling and Ensemble Method for Text Classification on Imbalanced Data
    Feng, Haijun
    Qin, Wen
    Wang, Huijing
    Li, Yi
    Hu, Guangwu
    BIG DATA, BIGDATA 2021, 2022, 12988 : 3 - 16
  • [38] Deep Learning-Based Imbalanced Classification With Fuzzy Support Vector Machine
    Wang, Ke-Fan
    An, Jing
    Wei, Zhen
    Cui, Can
    Ma, Xiang-Hua
    Ma, Chao
    Bao, Han-Qiu
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2022, 9
  • [39] A Segmented Local Offset Method for Imbalanced Data Classification Using Quasi-Linear Support Vector Machine
    Liang, Peifeng
    Yuan, Xin
    Li, Weite
    Hu, Jinglu
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 746 - 751
  • [40] Imbalanced Data Classification using Complementary Fuzzy Support Vector Machine Techniques and SMOTE
    Pruengkarn, Ratchakoon
    Wong, Kok Wai
    Fung, Chun Che
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 978 - 983