An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine

被引:2
|
作者
Zhu, Bo [1 ]
Jing, Xiaona [1 ]
Qiu, Lan [1 ]
Li, Runbo [1 ]
机构
[1] Kunming Univ Sci & Technol, Coll Mech & Elect Engn, Kunming 650500, Peoples R China
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 79卷 / 03期
关键词
Imbalanced data classification; Silhouette value; Mahalanobis distance; RIME algorithm; CS-SVM; SMOTE; MODEL;
D O I
10.32604/cmc.2024.048062
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When building a classification model, the scenario where the samples of one class are significantly more than those of the other class is called data imbalance. Data imbalance causes the trained classification model to be in favor of the majority class (usually defined as the negative class), which may do harm to the accuracy of the minority class (usually defined as the positive class), and then lead to poor overall performance of the model. A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article, which is based on a new hybrid resampling approach (MSHR) and a new fine cost-sensitive support vector machine (CS-SVM) classifier (FCSSVM). The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples, based on which, the so-called pseudo-negative samples are screened out to generate new positive samples (over-sampling step) through linear interpolation and are deleted finally (under-sampling step). This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline, without changing the overall scale of the dataset. The FCSSVM is an improved version of the traditional CS-SVM. It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously, and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice (RIME) algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline. To verify the effectiveness of the proposed method, a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets. The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases, and both the MSHR and the FCSSVM played significant roles.
引用
收藏
页码:3977 / 3999
页数:23
相关论文
共 50 条
  • [41] A multi-classification method of temporal data based on support vector machine
    Meng, Zhiqing
    Peng, Lifang
    Zhou, Gengui
    Zhu, Yihua
    COMPUTATIONAL INTELLIGENCE AND SECURITY, 2007, 4456 : 240 - +
  • [42] Combining Re-sampling with Twin Support Vector Machine for Imbalanced Data Classification
    Cao, Lu
    Shen, Hong
    2016 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2016, : 325 - 329
  • [43] Inverse free reduced universum twin support vector machine for imbalanced data classification
    Moosaei, Hossein
    Ganaie, M. A.
    Hladik, Milan
    Tanveer, M.
    NEURAL NETWORKS, 2023, 157 : 125 - 135
  • [44] Between-Class Discriminant Twin Support Vector Machine for Imbalanced Data Classification
    Liu, Lu
    Wang, Lei
    Ji, Hongbing
    Zang, Weihao
    Li, Danping
    2017 CHINESE AUTOMATION CONGRESS (CAC), 2017, : 7117 - 7122
  • [45] Weighted support vector machine for extremely imbalanced data
    Mun, Jongmin
    Bang, Sungwan
    Kim, Jaeoh
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2025, 203
  • [46] Prediction of Data Classification Based on Support Vector Machine
    Wu, Xinghui
    Zhou, Yuping
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2016), 2016, 50 : 694 - 699
  • [47] Data Classification with Support Vector Machine and Generalized Support Vector Machine
    Qi, Xiaomin
    Silvestrov, Sergei
    Nazir, Talat
    ICNPAA 2016 WORLD CONGRESS: 11TH INTERNATIONAL CONFERENCE ON MATHEMATICAL PROBLEMS IN ENGINEERING, AEROSPACE AND SCIENCES, 2017, 1798
  • [48] Performance of Support Vector Machine in Imbalanced Data Set
    Novakovic, Jasmina
    Markovic, Suzana
    2020 19TH INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA (INFOTEH), 2020,
  • [49] A fuzzy classification method based on support vector machine
    He, Q
    Wang, XZ
    Xing, HJ
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 1237 - 1240
  • [50] Study on Classification Method Based on Support Vector Machine
    Men, Hong
    Gao, Yanchun
    Wu, Yujie
    Li, Xiaoying
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL II, 2009, : 369 - 373