A new sampling method for classifying imbalanced data based on support vector machine ensemble

被引:96
|
作者
Jian, Chuanxia [1 ]
Gao, Jian [1 ]
Ao, Yinhui [1 ]
机构
[1] Guangdong Univ Technol, Sch Electromech Engn, Key Lab Mech Equipment Mfg & Control Technol, Minist Educ, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Sampling; Support vector machine; CLASSIFICATION;
D O I
10.1016/j.neucom.2016.02.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The insufficient information from the minority examples cannot exactly represent the inherent structure of the dataset, which leads to a low prediction accuracy of the minority through the existing classification methods. The over- and under-sampling methods help to increase the prediction accuracy of the minority. However, the two methods either lose important information or add trivial information for classification, so as to affect the prediction accuracy of the minority. Therefore, a new different contribution sampling method (DCS) based on the contributions of the support vectors (SVs) and the nonsupport vectors (NSVs) to classification is proposed in this paper. The proposed DCS method applies different sampling methods for the SVs and the NSVs and uses the biased support vector machine (B-SVM) method to identify the SVs and the NSVs of an imbalanced data. Moreover, the synthetic minority over sampling technique (SMOTE) and the random under-sampling technique (RUS) are used in the proposed method to re-sample the SVs in the minority and the NSVs in the majority, respectively. Examples are labeled by the ensemble of support vector machine (SVMen). Experiments are carried out on the imbalanced dataset which is selected from UCI, AVU06a, Statlog, DP01a, JP98a and CWH03a repositories. Experimental results show that for the imbalanced datasets, the proposed DCS method achieves a better performance in the aspects of Receiver Operating Characteristic (ROC) curve than other methods. The proposed DCS method improves 20.80%, 5.97%, 8.66% and 9.35% in terms of the geometric mean prediction accuracy G(mean) as compared with that achieved by using the NS, the US, the SMOTE and the ROS, respectively. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:115 / 122
页数:8
相关论文
共 50 条
  • [21] Weighted support vector machine for extremely imbalanced data
    Mun, Jongmin
    Bang, Sungwan
    Kim, Jaeoh
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2025, 203
  • [22] A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data
    Salehi, Amir Reza
    Khedmati, Majid
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [23] Performance of Support Vector Machine in Imbalanced Data Set
    Novakovic, Jasmina
    Markovic, Suzana
    2020 19TH INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA (INFOTEH), 2020,
  • [24] A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data
    Amir Reza Salehi
    Majid Khedmati
    Scientific Reports, 14
  • [25] New Method Based on Support Vector Machine in Classification for Hyperspectral Data
    Wang, Xiangtao
    Feng, Yan
    PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN, VOL 1, 2008, : 76 - 80
  • [26] Quantum Support Vector Machine for Classifying Noisy Data
    Li, Jiaye
    Li, Yangding
    Song, Jiagang
    Zhang, Jian
    Zhang, Shichao
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (09) : 2233 - 2247
  • [27] Interpolation of scattered data and classifying in support vector machine
    Wu, T
    He, HG
    8TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING, VOLS 1-3, PROCEEDING, 2001, : 1317 - 1320
  • [28] A New Optimal Ensemble Algorithm Based on SVDD Sampling for Imbalanced Data Classification
    Pirgazi, Jamshid
    Pirmohammadi, Abbas
    Shams, Reza
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (06)
  • [29] An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine
    Zhu, Bo
    Jing, Xiaona
    Qiu, Lan
    Li, Runbo
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (03): : 3977 - 3999
  • [30] Imbalanced Data Classification Based on Hybrid Resampling and Twin Support Vector Machine
    Cao, Lu
    Shen, Hong
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2017, 14 (03) : 579 - 595